title
🔥Python Data Analysis Projects for 2022 | Data Analysis With Python | Python Training | Simplilearn
description
🔥Enroll for Free Python Course & Get Your Completion Certificate: https://www.simplilearn.com/learn-python-basics-free-course-skillup?utm_campaign=PythonDataAnalysisProjectsFor202213June22&utm_medium=ShortsDescription&utm_source=youtube
This Python data analysis project will help you grasp the latest data analytics skills according to the current IT industry. This project will help you have Hands-on experience with powerful tools and libraries used for exploratory data analytics in real-time. You will be enlightened with critical data analytical fundamentals, skills, tips, and tricks required to become an expert in data analytics using python. This video on Python Data Analysis Projects for 2022 will help you learn how to use real-world data and perform exploratory data analysis. You will analyze and visualize the coronavirus and Olympics dataset with libraries such as NumPy, Pandas, Matplotlib, and Seaborn. Data Analysis with Python project will give you the experience will tackle real-world problems.
âś…Subscribe to our Channel to learn more programming languages: https://bit.ly/3eGepgQ
⏩ Check out the Python for beginners playlist: https://www.youtube.com/playlist?list=PLEiEAq2VkUUJO27b6PyoSd7CJjWIPyHYO
#PythonDataAnalysisProjectsFor2022 #DataAnalysisWithPython #PythonTraining #PythonTutorial #PythonProgramming #Python #Simplilearn
What is Python?
Python is a high-level object-oriented programming language developed by Guido van Rossum in 1989 and was first released in 1991. Python is often called batteries included language due to its comprehensive standard library. A fun fact about Python is that The name Python was actually taken from the popular BBC comedy show of that time, "Monty Python's Flying Circus". Python is widely used these days in data analytics, machine learning, and web development. Python allows you to write programs in fewer lines of code than most programming languages. Python as a programming language is growing rapidly. It's the right time to get trained in Python.
Following are the standard or built-in data types of Python:
1. Numeric data types
2. Text data type
3. Sequence data type
4. Mapping data type
5. Set data type
6. Boolean data type
7. Binary data type
A programming language needs to have support for numbers to carry out calculations. In Python, the numbers are categorized into different data types and the types are implemented in Python as classes. There are three numeric types in Python: int for integers, float for decimal numbers, and complex for complex numbers.
Simplilearn’s Python Training Course is an all-inclusive program that will introduce you to the Python development language and expose you to the essentials of object-oriented programming, web development with Django, and game development. Python has surpassed Java as the top language used to introduce U.S. students to programming and computer science. This course will give you hands-on development experience and prepare you for a career as a professional Python programmer.
What is this course about?
The All-in-One Python course enables you to become a professional Python programmer. Any aspiring programmer can learn Python from the basics and go on to master web development & game development in Python. Gain hands-on experience creating a flappy bird game clone & website functionalities in Python.
What are the course objectives?
By the end of this online Python training course, you will be able to:
1. Internalize the concepts & constructs of Python
2. Learn to create your own Python programs
3. Master Python Django & advanced web development in Python
4. Master PyGame & game development in Python
5. Create a flappy bird game clone
The Python training course is recommended for:
1. Any aspiring programmer can take up this bundle to master Python
2. Any aspiring web developer or game developer can take up this bundle to meet their training needs
Learn more at: https://www.simplilearn.com/mobile-and-software-development/python-development-training?utm_campaign=PythonDataAnalysisProjectsFor202213June22&utm_medium=ShortsDescription&utm_source=youtube
For more information about Simplilearn’s courses, visit:
- Facebook: https://www.facebook.com/Simplilearn
- Twitter: https://twitter.com/simplilearn
- LinkedIn: https://www.linkedin.com/company/simplilearn/
- Website: https://www.simplilearn.com
- Instagram: https://www.instagram.com/simplilearn_elearning
- Telegram Mobile: https://t.me/simplilearnupdates
- Telegram Desktop: https://web.telegram.org/#/im?p=@simplilearnupdates
Get the Simplilearn app: https://simpli.app.link/OlbFAhqMqgb
🔥🔥 Interested in Attending Live Classes? Call Us: IN - 18002127688 / US - +18445327688
detail
{'title': '🔥Python Data Analysis Projects for 2022 | Data Analysis With Python | Python Training | Simplilearn', 'heatmap': [{'end': 10078.689, 'start': 9924.176, 'weight': 1}], 'summary': 'Covers hands-on projects on covid-19 data analysis using python and tableau, with over 22 crore global infections and 4.5 million deaths, and also includes analysis of covid-19 vaccination, state-wise covid-19 analysis, covid case trends, vaccination analysis, global covid-19 data visualization, olympics data analysis, f1 data analysis, and ipl dataset exploration and cleaning.', 'chapters': [{'end': 400.7, 'segs': [{'end': 77.398, 'src': 'embed', 'start': 26.283, 'weight': 0, 'content': [{'end': 30.869, 'text': 'Hi everyone! Welcome to this video tutorial on Python Data Analysis project for 2022.', 'start': 26.283, 'duration': 4.586}, {'end': 35.915, 'text': 'In this video, we will be covering two interesting hands-on projects using Python programming.', 'start': 30.869, 'duration': 5.046}, {'end': 40.441, 'text': 'You will learn how to use real-world data to perform data analysis and data visualization.', 'start': 36.456, 'duration': 3.985}, {'end': 43.765, 'text': 'The two projects are based on Coronavirus and Olympics data.', 'start': 40.901, 'duration': 2.864}, {'end': 52.282, 'text': 'You will learn to collect, analyze, clean, manipulate and visualize data with the help of Python libraries such as NumPy, Pandas,', 'start': 44.457, 'duration': 7.825}, {'end': 53.483, 'text': 'Matplotlib and Seaborn.', 'start': 52.282, 'duration': 1.201}, {'end': 58.766, 'text': 'These two projects will give you the idea to solve real-world problems using exploratory data analysis.', 'start': 53.983, 'duration': 4.783}, {'end': 61.088, 'text': "So let's get started with our first project.", 'start': 59.387, 'duration': 1.701}, {'end': 66.355, 'text': 'Today, we are going to perform two hands-on projects on COVID data analysis using Python and Tableau.', 'start': 61.874, 'duration': 4.481}, {'end': 73.217, 'text': "This is going to be a really interesting and fun session where I'll be asking you a few generic quiz questions related to coronavirus.", 'start': 67.075, 'duration': 6.142}, {'end': 75.958, 'text': 'Please make sure to answer them in the comment section of the video.', 'start': 73.677, 'duration': 2.281}, {'end': 77.398, 'text': "We'll be happy to hear from you.", 'start': 76.258, 'duration': 1.14}], 'summary': 'Python tutorial covers covid and olympics data analysis projects for 2022.', 'duration': 51.115, 'max_score': 26.283, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs26283.jpg'}, {'end': 220.297, 'src': 'embed', 'start': 181.111, 'weight': 1, 'content': [{'end': 185.734, 'text': 'You will learn how to create different plots in Tableau and then make a dashboard from the visuals.', 'start': 181.111, 'duration': 4.623}, {'end': 195.83, 'text': 'The project will give you an idea about the impact of coronavirus globally in terms of confirmed cases, deaths reported, the number of recoveries,', 'start': 186.615, 'duration': 9.215}, {'end': 196.89, 'text': 'as well as active cases.', 'start': 195.83, 'duration': 1.06}, {'end': 207.953, 'text': 'We will also see how India has been affected since the pandemic started and dive into the different states and union territories to learn more about the COVID-19 influence and the vaccination status.', 'start': 198.51, 'duration': 9.443}, {'end': 211.975, 'text': "First, let me show you the two datasets that we'll be using.", 'start': 208.974, 'duration': 3.001}, {'end': 220.297, 'text': "So for our first project using Python, we'll be using the first two datasets, COVID-19 India and COVID vaccine statewide.", 'start': 212.655, 'duration': 7.642}], 'summary': "Learn to create tableau plots, visualize global covid-19 impact, and explore india's pandemic and vaccination data.", 'duration': 39.186, 'max_score': 181.111, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs181111.jpg'}], 'start': 26.283, 'title': 'Python and tableau for covid-19 data analysis', 'summary': 'Covers two hands-on projects on covid data analysis using python and tableau, including real-world data, python libraries like numpy, pandas, matplotlib, and seaborn, and statistics on covid-19 cases and deaths, with over 22 crore global infections and 4.5 million deaths, including india reporting over 3.3 crore confirmed cases and nearly 4,41,000 deaths. it also involves a covid-19 data analysis project using python and tableau, featuring the impact of coronavirus globally and in india, utilizing three datasets to analyze confirmed cases, deaths, recoveries, active cases, and vaccination status.', 'chapters': [{'end': 127.813, 'start': 26.283, 'title': 'Python data analysis 2022', 'summary': 'Covers two hands-on projects on covid data analysis using python and tableau, including using real-world data, python libraries like numpy, pandas, matplotlib, and seaborn, and statistics on covid-19 cases and deaths, including over 22 crore global infections and 4.5 million deaths, with india reporting over 3.3 crore confirmed cases and nearly 4,41,000 deaths.', 'duration': 101.53, 'highlights': ['The chapter covers two hands-on projects on COVID data analysis using Python and Tableau, including using real-world data, Python libraries like NumPy, Pandas, Matplotlib, and Seaborn, and statistics on COVID-19 cases and deaths, including over 22 crore global infections and 4.5 million deaths, with India reporting over 3.3 crore confirmed cases and nearly 4,41,000 deaths.', 'The two projects are based on Coronavirus and Olympics data, providing practical learning opportunities for data analysis and data visualization using Python programming.', 'The video tutorial aims to teach how to collect, analyze, clean, manipulate, and visualize data with the help of Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn, offering hands-on experience in solving real-world problems using exploratory data analysis.', 'The COVID-19 pandemic has infected over 22 crore people and killed more than 4.5 million globally, with India reporting over 3.3 crore confirmed cases and nearly 4,41,000 deaths, as per official figures released by the Union Ministry of Health and Family Welfare.', 'On March 11, 2020, the WHO declared COVID-19 a global health emergency, marking the severity and impact of the ongoing pandemic.']}, {'end': 400.7, 'start': 127.813, 'title': 'Covid-19 data analysis project', 'summary': 'Involves a covid-19 data analysis project using python and tableau, featuring the impact of coronavirus globally and in india, utilizing three datasets to analyze confirmed cases, deaths, recoveries, active cases, and vaccination status.', 'duration': 272.887, 'highlights': ['The project involves analyzing COVID-19 data using Python and Tableau, focusing on confirmed cases, deaths, recoveries, and active cases globally and in India. COVID-19 datasets, Python libraries, Tableau plots', 'The project will provide hands-on experience with real-world datasets to visualize data and draw conclusions. Hands-on experience, real-world datasets', 'The datasets to be used in the project include COVID-19 India, COVID vaccine statewide, featuring information on confirmed cases, deaths, recoveries, vaccination status, and more. COVID-19 India, COVID vaccine statewide, confirmed cases, deaths, recoveries, vaccination status', 'The COVID-19 India dataset comprises data on different states, confirmed cases, recoveries, deaths reported, and total confirmed cases, with a focus on recent data as of 11th August 2021. Data on different states, confirmed cases, recoveries, deaths reported, total confirmed cases, data as of 11th August 2021', 'The COVID vaccine statewide dataset includes information on doses administered, vaccination sites, vaccine types, gender distribution, age groups, and total individuals vaccinated daily. Doses administered, vaccination sites, vaccine types, gender distribution, age groups, total individuals vaccinated daily']}], 'duration': 374.417, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs26283.jpg', 'highlights': ['The chapter covers two hands-on projects on COVID data analysis using Python and Tableau, including using real-world data, Python libraries like NumPy, Pandas, Matplotlib, and Seaborn, and statistics on COVID-19 cases and deaths, including over 22 crore global infections and 4.5 million deaths, with India reporting over 3.3 crore confirmed cases and nearly 4,41,000 deaths.', 'The project involves analyzing COVID-19 data using Python and Tableau, focusing on confirmed cases, deaths, recoveries, and active cases globally and in India. COVID-19 datasets, Python libraries, Tableau plots', 'The two projects are based on Coronavirus and Olympics data, providing practical learning opportunities for data analysis and data visualization using Python programming.', 'The video tutorial aims to teach how to collect, analyze, clean, manipulate, and visualize data with the help of Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn, offering hands-on experience in solving real-world problems using exploratory data analysis.', 'The datasets to be used in the project include COVID-19 India, COVID vaccine statewide, featuring information on confirmed cases, deaths, recoveries, vaccination status, and more. COVID-19 India, COVID vaccine statewide, confirmed cases, deaths, recoveries, vaccination status']}, {'end': 1432.988, 'segs': [{'end': 429.736, 'src': 'embed', 'start': 400.7, 'weight': 0, 'content': [{'end': 405.964, 'text': "all right, before we jump into the hands-on part, let's have a look at the second quiz in this project.", 'start': 400.7, 'duration': 5.264}, {'end': 416.27, 'text': 'so here is the second quiz question which is the first country to start covid vaccination for toddlers?', 'start': 408.007, 'duration': 8.263}, {'end': 422.253, 'text': 'is it a japan, b israel, c portugal or is it d cuba?', 'start': 416.27, 'duration': 5.983}, {'end': 424.874, 'text': 'this is a very recent development that took place.', 'start': 422.253, 'duration': 2.621}, {'end': 429.736, 'text': 'if you watch daily news updates on coronavirus, you will be definitely able to answer the question.', 'start': 424.874, 'duration': 4.862}], 'summary': 'Second quiz: which country started covid vaccination for toddlers first? options: japan, israel, portugal, or cuba.', 'duration': 29.036, 'max_score': 400.7, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs400700.jpg'}, {'end': 481.247, 'src': 'embed', 'start': 448.533, 'weight': 1, 'content': [{'end': 455.921, 'text': "I'll just rename this notebook as COVID data analysis project.", 'start': 448.533, 'duration': 7.388}, {'end': 460.397, 'text': 'click on rename.', 'start': 458.476, 'duration': 1.921}, {'end': 461.598, 'text': 'all right.', 'start': 460.397, 'duration': 1.201}, {'end': 467.34, 'text': 'so first and foremost, we need to import all the necessary libraries that we are going to use.', 'start': 461.598, 'duration': 5.742}, {'end': 469.561, 'text': "so first i'm importing pandas spd.", 'start': 467.34, 'duration': 2.221}, {'end': 472.523, 'text': 'this is for data manipulation.', 'start': 469.561, 'duration': 2.962}, {'end': 477.445, 'text': 'then we have numpy as np numpy is used for numerical computation.', 'start': 472.523, 'duration': 4.922}, {'end': 481.247, 'text': 'then we are importing matplotlib, seaborn and plotly.', 'start': 477.445, 'duration': 3.802}], 'summary': 'Notebook renamed as covid data analysis project. libraries imported: pandas, numpy, matplotlib, seaborn, plotly.', 'duration': 32.714, 'max_score': 448.533, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs448533.jpg'}, {'end': 582.783, 'src': 'embed', 'start': 541.963, 'weight': 2, 'content': [{'end': 551.728, 'text': "will change the backslash to forward slash and after that I'm going to give my file name,", 'start': 541.963, 'duration': 9.765}, {'end': 554.789, 'text': 'followed by the extension of the file as a covid19india.csv.', 'start': 551.728, 'duration': 3.061}, {'end': 571.371, 'text': "let's go ahead and run it all right now to see the first few rows of the data frame.", 'start': 561.701, 'duration': 9.67}, {'end': 573.553, 'text': "I'm going to use the head function.", 'start': 571.371, 'duration': 2.182}, {'end': 582.783, 'text': "I'll say head and within brackets let's say I'll pass in 10, which means I want to see the first 10 rows of data.", 'start': 573.553, 'duration': 9.23}], 'summary': 'Convert backslash to forward slash, show first 10 rows of covid19india.csv.', 'duration': 40.82, 'max_score': 541.963, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs541963.jpg'}, {'end': 635.3, 'src': 'embed', 'start': 608.676, 'weight': 3, 'content': [{'end': 619.555, 'text': "all right, moving ahead, let's use the info function to get some idea about a data set.", 'start': 608.676, 'duration': 10.879}, {'end': 625.577, 'text': 'if i run it, you can see here it gives us the total number of columns.', 'start': 619.555, 'duration': 6.022}, {'end': 628.918, 'text': 'we have nine columns, the total number of entries or the rows.', 'start': 625.577, 'duration': 3.341}, {'end': 635.3, 'text': 'we have eighteen thousand hundred and ten rows of information, starting from zero till eighteen thousand one hundred and nine.', 'start': 628.918, 'duration': 6.382}], 'summary': 'Using the info function, we have 9 columns and 18,110 rows of data.', 'duration': 26.624, 'max_score': 608.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs608676.jpg'}, {'end': 703.342, 'src': 'embed', 'start': 668.115, 'weight': 4, 'content': [{'end': 679.602, 'text': 'okay, so if you can see here the describe function is for numerical columns only and you have the measures such as count,', 'start': 668.115, 'duration': 11.487}, {'end': 690.269, 'text': 'the mean standard deviation maximum minimum, the 25th percentile, 50th percentile and the 75th percentile value.', 'start': 679.602, 'duration': 10.667}, {'end': 699.799, 'text': "okay, now, let's move ahead and import the second data set, which is related to vaccination.", 'start': 690.269, 'duration': 9.53}, {'end': 703.342, 'text': "so I'll create a variable called vaccine underscore DF.", 'start': 699.799, 'duration': 3.543}], 'summary': 'The describe function provides measures for numerical columns, including count, mean, standard deviation, maximum, minimum, and percentiles. another dataset related to vaccination will be imported.', 'duration': 35.227, 'max_score': 668.115, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs668115.jpg'}, {'end': 813.949, 'src': 'embed', 'start': 742.744, 'weight': 5, 'content': [{'end': 746.447, 'text': 'covid underscore vaccine underscore state wise.', 'start': 742.744, 'duration': 3.703}, {'end': 759.039, 'text': "all right, let me run it cool and let's display the first seven rows of information from this data frame.", 'start': 746.447, 'duration': 12.592}, {'end': 762.482, 'text': "i'll be using the head function and inside the function i'll pass in seven.", 'start': 759.039, 'duration': 3.443}, {'end': 765.079, 'text': 'there you go.', 'start': 764.298, 'duration': 0.781}, {'end': 767.9, 'text': 'so here you can see we have from 0 till 6.', 'start': 765.079, 'duration': 2.821}, {'end': 771.583, 'text': 'there are total 24 columns.', 'start': 767.9, 'duration': 3.683}, {'end': 773.604, 'text': 'a lot of them have null values.', 'start': 771.583, 'duration': 2.021}, {'end': 782.25, 'text': 'you can see here all right now.', 'start': 773.604, 'duration': 8.646}, {'end': 793.148, 'text': "from the first data set, which is the covid underscore df data frame, we'll be dropping a few unnecessary columns,", 'start': 782.25, 'duration': 10.898}, {'end': 801.697, 'text': 'such as the time column confirmed Indian national and confirmed foreign national, as well as the s number.', 'start': 793.148, 'duration': 8.549}, {'end': 808.904, 'text': "we don't need these columns, so it's better to learn how to drop the columns for our analysis.", 'start': 801.697, 'duration': 7.207}, {'end': 813.949, 'text': "so I'll say covid underscore df dot.", 'start': 808.904, 'duration': 5.045}], 'summary': 'Analyzing covid-19 state-wise data with 24 columns and removing unnecessary columns for analysis.', 'duration': 71.205, 'max_score': 742.744, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs742744.jpg'}, {'end': 937.395, 'src': 'embed', 'start': 892.177, 'weight': 11, 'content': [{'end': 898.542, 'text': 'okay, now we have removed these four columns.', 'start': 892.177, 'duration': 6.365}, {'end': 904.747, 'text': 'let me show you the data set now there you go.', 'start': 898.542, 'duration': 6.205}, {'end': 911.844, 'text': 'so we have only the date column state or union territory cured deaths and confirmed.', 'start': 906.061, 'duration': 5.783}, {'end': 919.628, 'text': "now let's see how you can change the format of the date column.", 'start': 911.844, 'duration': 7.784}, {'end': 924.931, 'text': 'for that you have the function called to date time.', 'start': 919.628, 'duration': 5.303}, {'end': 929.073, 'text': "i'll say covid underscore df.", 'start': 924.931, 'duration': 4.142}, {'end': 930.874, 'text': "i'll pass in my column name, that is date.", 'start': 929.073, 'duration': 1.801}, {'end': 937.395, 'text': "I'll say equal to pd dot.", 'start': 932.651, 'duration': 4.744}], 'summary': 'Four columns removed, date, state/ut, cured, deaths, and confirmed in the dataset; using to_datetime function to change date format.', 'duration': 45.218, 'max_score': 892.177, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs892177.jpg'}, {'end': 1131.202, 'src': 'embed', 'start': 1104.74, 'weight': 7, 'content': [{'end': 1115.545, 'text': 'so in this table we will be summing all the confirmed deaths and cured cases for each of the states and union territories.', 'start': 1104.74, 'duration': 10.805}, {'end': 1118.727, 'text': 'so we will be using the pivot underscore table function for this.', 'start': 1115.545, 'duration': 3.182}, {'end': 1126.151, 'text': 'I will create a variable called state wise and say pd dot.', 'start': 1118.727, 'duration': 7.424}, {'end': 1131.202, 'text': 'I will use the pivot underscore table function.', 'start': 1128.601, 'duration': 2.601}], 'summary': 'Summing confirmed deaths and cured cases by state using pivot table.', 'duration': 26.462, 'max_score': 1104.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs1104740.jpg'}, {'end': 1352.076, 'src': 'embed', 'start': 1228.748, 'weight': 8, 'content': [{'end': 1243.815, 'text': "So I'll say state wise and within square brackets I'll pass in my variable that I want to create which is recovery rate.", 'start': 1228.748, 'duration': 15.067}, {'end': 1264.716, 'text': 'This will be equal to the cured cases multiplied by 100 by the total number of confirmed cases.', 'start': 1246.657, 'duration': 18.059}, {'end': 1270.301, 'text': "Within square brackets, I'll give my column as confirmed.", 'start': 1267.919, 'duration': 2.382}, {'end': 1273.831, 'text': "let's run this.", 'start': 1272.23, 'duration': 1.601}, {'end': 1278.195, 'text': "okay, I'll just copy this column, paste it here.", 'start': 1273.831, 'duration': 4.364}, {'end': 1281.878, 'text': 'so this time we are going to find out the mortality rate.', 'start': 1278.195, 'duration': 3.683}, {'end': 1289.385, 'text': 'so mortality rate is nothing but the total number of deaths, divided by the total number of confirmed cases into 100.', 'start': 1281.878, 'duration': 7.507}, {'end': 1294.409, 'text': 'so I am just going to replace the names here.', 'start': 1289.385, 'duration': 5.024}, {'end': 1309.228, 'text': "I'll say mortality, alright, and then instead of cured I will say my deaths column into 100, divided by the confirmed cases.", 'start': 1294.409, 'duration': 14.819}, {'end': 1310.549, 'text': "let's run it.", 'start': 1309.228, 'duration': 1.321}, {'end': 1322.657, 'text': 'ok, now we are going to sort the values based on the confirmed cases column and we will sort it in descending order.', 'start': 1310.549, 'duration': 12.108}, {'end': 1323.818, 'text': 'so let me show you how to do it.', 'start': 1322.657, 'duration': 1.161}, {'end': 1325.759, 'text': 'I will say state wise equal to.', 'start': 1323.818, 'duration': 1.941}, {'end': 1330.48, 'text': "I'll use the function short underscore values.", 'start': 1327.478, 'duration': 3.002}, {'end': 1352.076, 'text': "so I'll pass in my variable state wise dot and use the short underscore values function I'll say by I want to sort it by my confirmed cases column.", 'start': 1330.48, 'duration': 21.596}], 'summary': 'Creating recovery and mortality rates, sorting by confirmed cases.', 'duration': 123.328, 'max_score': 1228.748, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs1228748.jpg'}], 'start': 400.7, 'title': 'Covid-19 vaccination and data analysis', 'summary': "Discusses recent covid-19 vaccination for toddlers and necessary libraries for data analysis. it covers loading covid-19 data for india, data analysis and manipulation using python's pandas library, and creating a pivot table for visual representation.", 'chapters': [{'end': 582.783, 'start': 400.7, 'title': 'Covid vaccination and data analysis', 'summary': 'Discusses the recent covid-19 vaccination for toddlers, the necessary libraries for data analysis, and the loading of covid-19 data for india using python and jupyter notebook.', 'duration': 182.083, 'highlights': ['The first country to start COVID vaccination for toddlers is a recent development, and the options are Japan, Israel, Portugal, and Cuba.', 'The necessary libraries for data analysis are pandas for data manipulation, numpy for numerical computation, and matplotlib, seaborn, and plotly for creating visualizations.', "The process of loading the COVID-19 dataset for India involves using the pandas library's read_csv function to load the data from the 'covid19india.csv' file and displaying the first 10 rows of the data using the head function."]}, {'end': 892.177, 'start': 584.184, 'title': 'Data analysis and data set manipulation', 'summary': "Covers data analysis and manipulation using python's pandas library, including obtaining statistical information and manipulating data sets. it demonstrates reading and displaying data, dropping unnecessary columns, and performing basic statistical analysis on the data set.", 'duration': 307.993, 'highlights': ["The chapter provides a demonstration of using Python's pandas library to analyze data, showing 18,110 rows of information and providing details about the columns and memory usage. It displays 18,110 rows of information and provides details about the columns and memory usage, demonstrating the data analysis using Python's pandas library.", 'It showcases the use of the describe function to obtain basic statistics for numerical columns, including count, mean, standard deviation, maximum, minimum, and percentile values. The chapter demonstrates the use of the describe function to obtain basic statistics for numerical columns, including count, mean, standard deviation, maximum, minimum, and percentile values.', 'The chapter illustrates importing a second data set related to vaccination and displaying the first seven rows of information from the data frame. It illustrates importing a second data set related to vaccination and displaying the first seven rows of information from the data frame, showcasing the process of working with multiple data sets.', 'It provides a step-by-step demonstration of dropping unnecessary columns from a data frame using the drop function, ensuring efficient data manipulation for analysis. The chapter provides a step-by-step demonstration of dropping unnecessary columns from a data frame using the drop function, ensuring efficient data manipulation for analysis.']}, {'end': 1432.988, 'start': 892.177, 'title': 'Pivot table and data manipulation', 'summary': 'Covers the process of data manipulation, including changing the date format, calculating active cases, creating a pivot table, finding recovery and mortality rates, sorting values, and plotting a pivot table visually.', 'duration': 540.811, 'highlights': ['Creating a pivot table to sum confirmed, deaths, and cured cases for each state/union territory The chapter demonstrates the creation of a pivot table using the pivot_table function in pandas to sum confirmed, deaths, and cured cases for each state/union territory.', 'Calculating recovery rate by dividing total cured cases by total confirmed cases The process involves calculating the recovery rate by dividing the total number of cured cases by the total number of confirmed cases and multiplying the result by 100.', 'Sorting values based on confirmed cases column in descending order The chapter explains how to sort values based on the confirmed cases column and sort them in descending order using the sort_values function in pandas.', 'Calculating mortality rate by dividing total deaths by total confirmed cases The process involves calculating the mortality rate by dividing the total number of deaths by the total number of confirmed cases and multiplying the result by 100.', 'Changing the date column format using the to_datetime function The chapter showcases the process of changing the format of the date column using the to_datetime function in pandas.']}], 'duration': 1032.288, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs400700.jpg', 'highlights': ['The first country to start COVID vaccination for toddlers is a recent development, and the options are Japan, Israel, Portugal, and Cuba.', 'The necessary libraries for data analysis are pandas for data manipulation, numpy for numerical computation, and matplotlib, seaborn, and plotly for creating visualizations.', "The process of loading the COVID-19 dataset for India involves using the pandas library's read_csv function to load the data from the 'covid19india.csv' file and displaying the first 10 rows of the data using the head function.", "The chapter provides a demonstration of using Python's pandas library to analyze data, showing 18,110 rows of information and providing details about the columns and memory usage.", 'It showcases the use of the describe function to obtain basic statistics for numerical columns, including count, mean, standard deviation, maximum, minimum, and percentile values.', 'It illustrates importing a second data set related to vaccination and displaying the first seven rows of information from the data frame, showcasing the process of working with multiple data sets.', 'It provides a step-by-step demonstration of dropping unnecessary columns from a data frame using the drop function, ensuring efficient data manipulation for analysis.', 'Creating a pivot table to sum confirmed, deaths, and cured cases for each state/union territory', 'Calculating recovery rate by dividing total cured cases by total confirmed cases The process involves calculating the recovery rate by dividing the total number of cured cases by the total number of confirmed cases and multiplying the result by 100.', 'Sorting values based on confirmed cases column in descending order The chapter explains how to sort values based on the confirmed cases column and sort them in descending order using the sort_values function in pandas.', 'Calculating mortality rate by dividing total deaths by total confirmed cases The process involves calculating the mortality rate by dividing the total number of deaths by the total number of confirmed cases and multiplying the result by 100.', 'Changing the date column format using the to_datetime function The chapter showcases the process of changing the format of the date column using the to_datetime function in pandas.']}, {'end': 2764.581, 'segs': [{'end': 1587.519, 'src': 'embed', 'start': 1467.028, 'weight': 0, 'content': [{'end': 1471.129, 'text': 'now, as i said in the beginning, there are a few discrepancies in the data set.', 'start': 1467.028, 'duration': 4.101}, {'end': 1479.077, 'text': "so here you can see there's one called Maharashtra and there's also Maharashtra triple star.", 'start': 1471.129, 'duration': 7.948}, {'end': 1481.598, 'text': 'this you can ignore, even if I scroll down.', 'start': 1479.077, 'duration': 2.521}, {'end': 1484.159, 'text': 'you have Madhya Pradesh, followed by three asterisks.', 'start': 1481.598, 'duration': 2.561}, {'end': 1486.72, 'text': 'you can ignore this value as well, even for Bihar we have.', 'start': 1484.159, 'duration': 2.561}, {'end': 1497.009, 'text': 'so these have been duplicated and here you can see the different state names and unit entries.', 'start': 1488.564, 'duration': 8.445}, {'end': 1505.474, 'text': 'then on the top you have the confirmed cases, cured cases, the deaths reported and the new columns that we created.', 'start': 1497.009, 'duration': 8.465}, {'end': 1518.102, 'text': 'these are calculated columns recovery rate and mortality rate and we have ordered it in descending order of confirmed cases so so far.', 'start': 1505.474, 'duration': 12.628}, {'end': 1529.412, 'text': 'Our data says that Maharashtra has the highest number of cases, followed by Kerala, Karnataka, Tamil Nadu, Andhra Pradesh and Uttar Pradesh.', 'start': 1520.689, 'duration': 8.723}, {'end': 1535.174, 'text': 'So these are the top five states which have the highest number of confirmed cases.', 'start': 1529.452, 'duration': 5.722}, {'end': 1539.215, 'text': 'Even if you see the mortality rate is also high for Maharashtra.', 'start': 1536.014, 'duration': 3.201}, {'end': 1547.738, 'text': 'And if I scroll down, the mortality rate is also high for Uttarakhand if you see here.', 'start': 1540.916, 'duration': 6.822}, {'end': 1554.117, 'text': 'If I scroll further, you have Punjab, the mortality rate is also high.', 'start': 1549.716, 'duration': 4.401}, {'end': 1556.098, 'text': 'All right.', 'start': 1555.738, 'duration': 0.36}, {'end': 1565.921, 'text': 'So this was our first visual that we created in the COVID data analysis project.', 'start': 1559.319, 'duration': 6.602}, {'end': 1574.843, 'text': "Now moving ahead, we'll see the top 10 states based on the number of active cases.", 'start': 1567.061, 'duration': 7.782}, {'end': 1577.024, 'text': "So we'll start.", 'start': 1576.344, 'duration': 0.68}, {'end': 1580.416, 'text': "I'll give a comment.", 'start': 1579.636, 'duration': 0.78}, {'end': 1587.519, 'text': 'top 10 active cases states.', 'start': 1580.416, 'duration': 7.103}], 'summary': 'Data analysis shows maharashtra has highest cases, followed by kerala, karnataka, tamil nadu, andhra pradesh, and uttar pradesh.', 'duration': 120.491, 'max_score': 1467.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs1467028.jpg'}, {'end': 2079.382, 'src': 'embed', 'start': 2045.091, 'weight': 4, 'content': [{'end': 2053.255, 'text': "Give us piece and we are going to copy the cell and I'll paste it here.", 'start': 2045.091, 'duration': 8.164}, {'end': 2060.306, 'text': "all right now it's time to run it.", 'start': 2055.222, 'duration': 5.084}, {'end': 2062.328, 'text': 'there you go, you see it.', 'start': 2060.306, 'duration': 2.022}, {'end': 2066.512, 'text': 'here we have a nice bar plot ready.', 'start': 2062.328, 'duration': 4.184}, {'end': 2073.157, 'text': 'on the top you can see the title top 10 states with most active cases, and you see the edges are in red color.', 'start': 2066.512, 'duration': 6.645}, {'end': 2074.958, 'text': 'for all the bars.', 'start': 2073.157, 'duration': 1.801}, {'end': 2079.382, 'text': 'on the x-axis you have the different state names Maharashtra, Karnataka, Kerala.', 'start': 2074.958, 'duration': 4.424}], 'summary': 'A bar plot shows top 10 states with most active cases: maharashtra, karnataka, kerala.', 'duration': 34.291, 'max_score': 2045.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs2045091.jpg'}, {'end': 2419.394, 'src': 'embed', 'start': 2379.694, 'weight': 6, 'content': [{'end': 2394.901, 'text': 'x label would be states my y label will be total death cases.', 'start': 2379.694, 'duration': 15.207}, {'end': 2401.004, 'text': 'then I will write plt.show.', 'start': 2394.901, 'duration': 6.103}, {'end': 2403.325, 'text': "now let's run it.", 'start': 2401.004, 'duration': 2.321}, {'end': 2404.826, 'text': 'there you go.', 'start': 2403.325, 'duration': 1.501}, {'end': 2407.167, 'text': 'you can see here we have a nice bar plot.', 'start': 2404.826, 'duration': 2.341}, {'end': 2410.909, 'text': 'on the top we have the title top 10 states with most deaths.', 'start': 2407.167, 'duration': 3.742}, {'end': 2416.612, 'text': 'now what I specifically wanted you to see was these discrepancies in the data.', 'start': 2410.909, 'duration': 5.703}, {'end': 2419.394, 'text': 'you can see here Maharashtra is repeated twice.', 'start': 2416.612, 'duration': 2.782}], 'summary': 'Data visualization shows discrepancies in death cases, with maharashtra repeated twice.', 'duration': 39.7, 'max_score': 2379.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs2379694.jpg'}, {'end': 2474.97, 'src': 'embed', 'start': 2447.513, 'weight': 9, 'content': [{'end': 2454.257, 'text': 'so we have Maharashtra, Karnataka, Tamil Nadu, Delhi, then Uttar Pradesh, West Bengal, Kerala, Punjab,', 'start': 2447.513, 'duration': 6.744}, {'end': 2461.162, 'text': 'Andhra Pradesh and Chhattisgarh with the states that have the most number of deaths reported.', 'start': 2454.257, 'duration': 6.905}, {'end': 2474.97, 'text': "okay, now we'll create a line plot to see the growth or the trend of active cases for top five states with most number of confirmed cases.", 'start': 2461.162, 'duration': 13.808}], 'summary': "Top 10 states with most deaths reported and line plot for top 5 states' active cases.", 'duration': 27.457, 'max_score': 2447.513, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs2447513.jpg'}], 'start': 1432.988, 'title': 'Covid-19 state analysis', 'summary': 'Demonstrates the creation of pivot tables, visualization of covid-19 data, and identification of top states with the highest confirmed cases and mortality rates, with maharashtra having the highest number of cases and high mortality rates. it also includes creating a bar plot of the top 10 states with the most active cases, and examining the top five affected states in india, highlighting maharashtra, karnataka, kerala, tamil nadu, and uttar pradesh.', 'chapters': [{'end': 1740.927, 'start': 1432.988, 'title': 'Covid data analysis visualizations', 'summary': 'Demonstrates the creation of pivot tables, visualization of covid-19 data, and identification of top states with the highest confirmed cases and mortality rates, with maharashtra having the highest number of cases and high mortality rates.', 'duration': 307.939, 'highlights': ['The chapter starts by creating a pivot table and identifying discrepancies in the data set, such as duplicate state names and unit entries. Identifying discrepancies in the data set, including duplicate state names and unit entries.', 'The analysis reveals the top five states with the highest number of confirmed COVID-19 cases, with Maharashtra leading, followed by Kerala, Karnataka, Tamil Nadu, and Andhra Pradesh. Highlighting the top five states with the highest number of confirmed COVID-19 cases.', 'The visualization also showcases the high mortality rates in Maharashtra, Uttarakhand, and Punjab, emphasizing the severity of the situation in these states. Emphasizing the high mortality rates in Maharashtra, Uttarakhand, and Punjab.', 'The process continues with the exploration of the top 10 states based on the number of active COVID-19 cases using the pandas groupby function. Exploring the top 10 states based on the number of active COVID-19 cases using the pandas groupby function.']}, {'end': 2109.631, 'start': 1740.927, 'title': 'Creating bar plot of top 10 active cases', 'summary': 'Demonstrates creating a bar plot of the top 10 states with the most active cases in india, highlighting the highest number of cases in maharashtra, followed by karnataka, kerala, and tamil nadu, with the total active cases in lakhs.', 'duration': 368.704, 'highlights': ['Creating a bar plot of the top 10 states with the most active cases, with Maharashtra having the highest number of cases, followed by Karnataka, Kerala, and Tamil Nadu at 2nd, 3rd, and 4th place respectively.', "Setting the figure size to 16, comma 9 and providing a title for the plot as 'Top 10 states with most active cases in India' with a size of 25.", "Defining the x-axis as state/union territory and the y-axis as active cases, and addressing overlapping state labels by setting X label as 'states' and Y label as 'total active cases' with the total active cases displayed in lakhs."]}, {'end': 2447.513, 'start': 2109.631, 'title': 'Top 10 states with most deaths', 'summary': 'Demonstrates the process of creating a bar plot to display the top 10 states with the most deaths due to discrepancies in the data, including maharashtra being repeated twice and a spelling error in karnataka, resulting in the exclusion of these two results.', 'duration': 337.882, 'highlights': ['Creating a bar plot to display the top 10 states with the most deaths The chapter demonstrates the process of creating a bar plot to display the top 10 states with the most deaths.', 'Discrepancies in the data, including Maharashtra being repeated twice and a spelling error in Karnataka, resulting in the exclusion of these two results The chapter highlights discrepancies in the data, such as Maharashtra being repeated twice and a spelling error in Karnataka, resulting in the exclusion of these two results.']}, {'end': 2764.581, 'start': 2447.513, 'title': 'Covid-19 state analysis', 'summary': 'Examines the top five affected states in india, highlighting maharashtra, karnataka, kerala, tamil nadu, and uttar pradesh, with maharashtra experiencing a surge in active cases around april and may, followed by karnataka, uttar pradesh, and tamil nadu.', 'duration': 317.068, 'highlights': ['The top five affected states in India are Maharashtra, Karnataka, Kerala, Tamil Nadu, and Uttar Pradesh, with Maharashtra experiencing a surge in active cases around April and May.', 'Karnataka also witnessed a rapid surge in active cases around May and June before decreasing.', 'Uttar Pradesh and Tamil Nadu also experienced surges in active cases, highlighting the impact of COVID-19 on these states.', 'The line plot visualizes the growth or trend of active cases for the top five affected states in India, providing a clear representation of the surge and decline in active cases over time.']}], 'duration': 1331.593, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs1432988.jpg', 'highlights': ['Identifying discrepancies in the data set, including duplicate state names and unit entries.', 'Highlighting the top five states with the highest number of confirmed COVID-19 cases.', 'Emphasizing the high mortality rates in Maharashtra, Uttarakhand, and Punjab.', 'Exploring the top 10 states based on the number of active COVID-19 cases using the pandas groupby function.', 'Creating a bar plot of the top 10 states with the most active cases, with Maharashtra having the highest number of cases.', "Setting the figure size to 16, comma 9 and providing a title for the plot as 'Top 10 states with most active cases in India' with a size of 25.", 'Creating a bar plot to display the top 10 states with the most deaths.', 'The chapter highlights discrepancies in the data, such as Maharashtra being repeated twice and a spelling error in Karnataka, resulting in the exclusion of these two results.', 'The top five affected states in India are Maharashtra, Karnataka, Kerala, Tamil Nadu, and Uttar Pradesh, with Maharashtra experiencing a surge in active cases around April and May.', 'The line plot visualizes the growth or trend of active cases for the top five affected states in India, providing a clear representation of the surge and decline in active cases over time.']}, {'end': 3876.559, 'segs': [{'end': 2830.672, 'src': 'embed', 'start': 2764.581, 'weight': 0, 'content': [{'end': 2779.131, 'text': 'you can see one common trend that after March, so around April, the cases started to emerge very rapidly and later, after July, they started dipping.', 'start': 2764.581, 'duration': 14.55}, {'end': 2787.556, 'text': 'okay, now we are going to use our second data set, which is related to vaccination.', 'start': 2779.131, 'duration': 8.425}, {'end': 2800.046, 'text': 'all right, so, first and foremost, let me go ahead and print the data frame for you so that you know the data that we are going to use.', 'start': 2787.556, 'duration': 12.49}, {'end': 2806.149, 'text': 'so this is the data we are going to use for our next set of analysis.', 'start': 2800.046, 'duration': 6.103}, {'end': 2810.33, 'text': 'even this data has a few errors and you can see there are a lot of missing values.', 'start': 2806.149, 'duration': 4.181}, {'end': 2816.773, 'text': 'it makes sense because not all the days you would have vaccination for people.', 'start': 2810.33, 'duration': 6.443}, {'end': 2825.232, 'text': 'and very crucial error that we are going to deal with is this state column.', 'start': 2816.773, 'duration': 8.459}, {'end': 2830.672, 'text': 'here you can see in the state column we have India in a few rows.', 'start': 2825.232, 'duration': 5.44}], 'summary': 'Rapid surge in cases after march, followed by a decline post-july. analysis of vaccination data reveals errors and missing values.', 'duration': 66.091, 'max_score': 2764.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs2764581.jpg'}, {'end': 3410.582, 'src': 'embed', 'start': 3377.017, 'weight': 2, 'content': [{'end': 3382.338, 'text': "so we'll take some time to create the pie chart.", 'start': 3377.017, 'duration': 5.321}, {'end': 3384.399, 'text': 'here we go.', 'start': 3382.338, 'duration': 2.061}, {'end': 3389.42, 'text': 'see, here i have my title male and female vaccination.', 'start': 3384.399, 'duration': 5.021}, {'end': 3393.221, 'text': 'so these are the two pies or the areas.', 'start': 3389.42, 'duration': 3.801}, {'end': 3397.572, 'text': 'have the label female and the value.', 'start': 3394.83, 'duration': 2.742}, {'end': 3400.455, 'text': 'here you have the label male and the value.', 'start': 3397.572, 'duration': 2.883}, {'end': 3410.582, 'text': 'so from our data you can see 53 percent male individuals have been vaccinated, compared to 47 percent for females.', 'start': 3400.455, 'duration': 10.127}], 'summary': 'Pie chart shows 53% male, 47% female vaccination rates.', 'duration': 33.565, 'max_score': 3377.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs3377017.jpg'}, {'end': 3721.707, 'src': 'embed', 'start': 3685.957, 'weight': 3, 'content': [{'end': 3687.838, 'text': 'So this will create a nice table for us.', 'start': 3685.957, 'duration': 1.881}, {'end': 3688.779, 'text': 'There you go.', 'start': 3688.379, 'duration': 0.4}, {'end': 3693.463, 'text': 'Here you can see we have Maharashtra, Uttar, Pradesh, Rajasthan,', 'start': 3689.64, 'duration': 3.823}, {'end': 3697.566, 'text': 'Gujarat and West Bengal as the top five states with most number of vaccinated individuals.', 'start': 3693.463, 'duration': 4.103}, {'end': 3703.031, 'text': 'Now we are going to use this table and convert it into a chart.', 'start': 3698.727, 'duration': 4.304}, {'end': 3709.683, 'text': "Now I had already written my code for the bar plot, I'm just going to paste it.", 'start': 3704.181, 'duration': 5.502}, {'end': 3721.707, 'text': "So here, I have defined my figure size as 10 comma five, then I'm giving a title as top five vaccinated states in India and the size is 20.", 'start': 3710.984, 'duration': 10.723}], 'summary': 'Top 5 states with most vaccinated individuals: maharashtra, uttar pradesh, rajasthan, gujarat, west bengal.', 'duration': 35.75, 'max_score': 3685.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs3685957.jpg'}], 'start': 2764.581, 'title': 'Covid case trends and vaccination analysis', 'summary': 'Discusses the trends of covid cases from march to july, with a rapid increase in cases from april and a subsequent decline after july. it also covers the analysis of vaccination data, highlighting errors and missing values, renaming columns, and overview of a dataset with 24 columns and 7845 entries. additionally, it involves analyzing a dataset with missing values, dropping specific columns, and creating a pie plot to visualize vaccination percentages for male and female individuals, revealing 53% of male individuals vaccinated compared to 47% for females. the chapter also demonstrates data manipulation techniques in python, identifies the top five states with the most vaccinated individuals, with maharashtra being the highest, and encourages viewers to find and visualize the bottom five vaccinated states.', 'chapters': [{'end': 2964.741, 'start': 2764.581, 'title': 'Covid cases and vaccination analysis', 'summary': "Discusses the trends of covid cases from march to july, with a rapid increase in cases from april and a subsequent decline after july. it also covers the analysis of vaccination data, highlighting the presence of errors and missing values, as well as the renaming of the 'updated on' column to 'vaccine date', and the overview of the dataset with 24 columns and 7845 entries.", 'duration': 200.16, 'highlights': ['The cases started to emerge rapidly after March, peaking in April and later dipping after July. It shows a rapid increase in Covid cases after March, peaking in April, and later declining after July.', 'The dataset for vaccination analysis contains errors and missing values. The vaccination dataset has errors and missing values, which is expected due to the variation in vaccination occurrences.', "The 'state' column contains data for 'India' in a few rows, which needs to be ignored. The 'state' column contains 'India' data in a few rows, and it needs to be ignored during the analysis.", "The 'updated on' column is renamed to 'vaccine date' using the rename function in the pandas library. The 'updated on' column is renamed to 'vaccine date' using the rename function in the pandas library.", 'The dataset consists of 24 columns, 7845 entries, and the info function provides an overview of the data types and memory usage. The dataset contains 24 columns, 7845 entries, and the info function offers an overview of the data types and memory usage.']}, {'end': 3410.582, 'start': 2966.929, 'title': 'Data analysis and visualization for vaccination', 'summary': 'Involves analyzing a dataset with missing values, dropping specific columns, and creating a pie plot to visualize vaccination percentages for male and female individuals, revealing that 53% of male individuals have been vaccinated compared to 47% for females.', 'duration': 443.653, 'highlights': ['53 percent male individuals have been vaccinated, compared to 47 percent for females. Quantifiable data showing the vaccination percentages for male and female individuals.', 'Dropped columns include doses administered, Sputnik V doses administered, AEFI, 18 to 44 years doses administered, 45 to 60 years doses administered, and 60 plus years doses administered. Detailed information about the specific columns that were dropped from the dataset.', 'The dataset contains missing values in almost all columns, including male and female individuals vaccinated and age groups. Key point about the prevalence of missing values in the dataset.']}, {'end': 3876.559, 'start': 3410.582, 'title': 'Data analysis: vaccination states', 'summary': 'Demonstrates data manipulation techniques in python, including removing rows with specific criteria, renaming columns, and creating visualizations to identify the top five states with the most vaccinated individuals, maharashtra being the highest, and encourages viewers to find and visualize the bottom five vaccinated states.', 'duration': 465.977, 'highlights': ['The top five states with the most vaccinated individuals are Maharashtra, Uttar Pradesh, Rajasthan, Gujarat, and West Bengal. Using Python, the chapter identifies the top five states with the most vaccinated individuals, with Maharashtra leading the list and provides specific state names and their respective positions.', 'Demonstration of creating a bar plot to visualize the top five vaccinated states in India. The chapter showcases the process of creating a bar plot in Python to visualize the top five vaccinated states in India, with Maharashtra having the highest number of vaccinations and the specific steps involved in creating the plot.', 'Encouragement for viewers to find and visualize the bottom five vaccinated states. The chapter encourages viewers to find and visualize the bottom five vaccinated states in India and requests them to share their code snippets in the comments section, offering to provide answers if needed.', 'Transition to the second project on COVID data analysis using Python. The chapter concludes the first project on COVID data analysis using Python and announces the transition to the second project, providing a smooth segue between the two projects.']}], 'duration': 1111.978, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs2764581.jpg', 'highlights': ['The cases started to emerge rapidly after March, peaking in April and later dipping after July.', 'The dataset for vaccination analysis contains errors and missing values.', '53 percent male individuals have been vaccinated, compared to 47 percent for females.', 'The top five states with the most vaccinated individuals are Maharashtra, Uttar Pradesh, Rajasthan, Gujarat, and West Bengal.', 'Demonstration of creating a bar plot to visualize the top five vaccinated states in India.']}, {'end': 5708.69, 'segs': [{'end': 3905.073, 'src': 'embed', 'start': 3877.179, 'weight': 0, 'content': [{'end': 3882.482, 'text': "For this project, we'll be using the Tableau software and a global coronavirus dataset.", 'start': 3877.179, 'duration': 5.303}, {'end': 3885.984, 'text': "First, we'll look at the dataset and the fields we have.", 'start': 3883.403, 'duration': 2.581}, {'end': 3889.806, 'text': 'Okay, so I am in my dataset folder.', 'start': 3887.665, 'duration': 2.141}, {'end': 3895.209, 'text': 'We have already seen the first two datasets in our Python project.', 'start': 3890.847, 'duration': 4.362}, {'end': 3898.051, 'text': 'This time, we are going to use the global COVID data.', 'start': 3895.889, 'duration': 2.162}, {'end': 3898.791, 'text': 'Let me open it.', 'start': 3898.211, 'duration': 0.58}, {'end': 3901.37, 'text': 'all right.', 'start': 3900.529, 'duration': 0.841}, {'end': 3905.073, 'text': 'so here you can see we have the different country names.', 'start': 3901.37, 'duration': 3.703}], 'summary': 'Using tableau with global coronavirus dataset to analyze country-wise covid data.', 'duration': 27.894, 'max_score': 3877.179, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs3877179.jpg'}, {'end': 3998.485, 'src': 'embed', 'start': 3971.197, 'weight': 5, 'content': [{'end': 3986.759, 'text': "confirmed deaths and the recovered cases column and we'll be using the tableau software to create interesting visualizations and we'll convert those visuals and put it in the form of a dashboard towards the end of the project.", 'start': 3971.197, 'duration': 15.562}, {'end': 3991.961, 'text': 'okay, so let me close this file or let it be open.', 'start': 3986.759, 'duration': 5.202}, {'end': 3993.843, 'text': "i'll search for tableau public.", 'start': 3991.961, 'duration': 1.882}, {'end': 3998.485, 'text': 'you can see here i have tableau public installed.', 'start': 3993.843, 'duration': 4.642}], 'summary': 'Using tableau to visualize confirmed deaths and recovered cases data.', 'duration': 27.288, 'max_score': 3971.197, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs3971197.jpg'}, {'end': 4403.136, 'src': 'embed', 'start': 4373.411, 'weight': 1, 'content': [{'end': 4382.274, 'text': 'so This horizontal bar graph has information only for these few countries that have the highest number of confirmed cases.', 'start': 4373.411, 'duration': 8.863}, {'end': 4387.915, 'text': 'Now you can go ahead and play with the color formatting.', 'start': 4382.974, 'duration': 4.941}, {'end': 4394.917, 'text': 'So here under colors, let me choose a green color.', 'start': 4388.675, 'duration': 6.242}, {'end': 4403.136, 'text': 'okay, now you can mark here, since we selected only the top 12 countries with the highest number of confirmed cases.', 'start': 4396.371, 'duration': 6.765}], 'summary': 'Horizontal bar graph displays top 12 countries with highest confirmed cases.', 'duration': 29.725, 'max_score': 4373.411, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs4373411.jpg'}, {'end': 5416.994, 'src': 'embed', 'start': 5375.505, 'weight': 4, 'content': [{'end': 5382.95, 'text': 'so here you have one column and on the right also, you will have one column with their own axis values.', 'start': 5375.505, 'duration': 7.445}, {'end': 5388.975, 'text': 'so this dual axis chart will be for recovered cases and the death cases.', 'start': 5382.95, 'duration': 6.025}, {'end': 5391.437, 'text': 'let me show you how to do it.', 'start': 5388.975, 'duration': 2.462}, {'end': 5406.186, 'text': "so i'll drag my country field onto columns and then I'll choose recovered column onto rows.", 'start': 5391.437, 'duration': 14.749}, {'end': 5416.994, 'text': "so here I have got my line chart and the second column I'm going to choose is the deaths column.", 'start': 5406.186, 'duration': 10.808}], 'summary': 'Creating a dual-axis chart for recovered and death cases, using country data.', 'duration': 41.489, 'max_score': 5375.505, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs5375505.jpg'}], 'start': 3877.179, 'title': 'Visualizing global covid-19 data with tableau', 'summary': 'Covers using tableau to visualize and analyze global coronavirus data for 192 countries, including creating reports, visualizing covid-19 data, analyzing deaths and recoveries, and creating dual axis charts and dashboards.', 'chapters': [{'end': 4102.721, 'start': 3877.179, 'title': 'Using tableau for global covid data', 'summary': 'Discusses using tableau software to visualize and analyze a global coronavirus dataset containing information on 192 countries, including active cases, confirmed cases, deaths, and recovered cases, with a plan to create visualizations and dashboards using tableau public edition.', 'duration': 225.542, 'highlights': ['The dataset contains information on 192 countries, including columns for active cases, confirmed cases, deaths, mortality rate, and recovered cases, as of 12th March 2021.', 'The plan involves using Tableau Public edition to create visualizations and dashboards based on the country column, active, confirmed, deaths, and recovered cases.', 'The chapter also covers connecting the CSV dataset to Tableau, with guidance on using the Tableau Public edition and the different symbols representing geographic, date, text, and numeric fields.']}, {'end': 4430.244, 'start': 4102.721, 'title': 'Creating tableau reports', 'summary': 'Demonstrates loading data into tableau, creating a table to display the total number of confirmed covid-19 cases for each country, and converting the table into a sorted horizontal bar graph to visualize the top countries with the highest number of confirmed cases.', 'duration': 327.523, 'highlights': ['Creating a table to display the total number of confirmed cases for each country The speaker drags the country/region column onto rows and the confirmed column onto text to display a table with country names and the total confirmed cases, with data until March 2021.', 'Sorting the table in ascending and descending order based on confirmed cases The table is sorted in ascending and descending order based on the sum of confirmed cases, revealing the countries with the highest and lowest number of COVID-19 cases.', 'Converting the table into a sorted horizontal bar graph The table is converted into a horizontal bar graph and sorted in descending order to visualize the top countries with the highest number of confirmed COVID-19 cases.', 'Filtering and color formatting for the top countries The speaker filters the bar graph to display only the top 12 countries with the highest number of confirmed cases and demonstrates color formatting for visualization.']}, {'end': 4753.63, 'start': 4430.244, 'title': 'Visualizing covid-19 data', 'summary': 'Demonstrates how to convert a horizontal bar plot into a vertical bar plot, create a global map to represent covid-19 cases, and visualize the top 10 countries with the highest deaths using a tree map.', 'duration': 323.386, 'highlights': ['Creating a vertical bar plot from a horizontal bar plot The demonstration showcases the process of converting a horizontal bar plot into a vertical bar plot, providing a visual guide for data manipulation.', 'Creating a global map to represent COVID-19 cases The process of creating a global map to visualize COVID-19 cases, including the steps for mapping longitude, latitude, and country regions, is explained, demonstrating a comprehensive visualization of the data.', 'Visualizing the top 10 countries with the highest deaths using a tree map The chapter introduces the concept of visualizing the top 10 countries with the highest deaths due to the pandemic using a tree map, demonstrating the application of different visualization techniques for varied data sets.']}, {'end': 5044.663, 'start': 4753.63, 'title': 'Visualizing covid-19 data', 'summary': 'Demonstrates creating visualizations in tableau to display the top 10 countries with the highest number of deaths and recoveries, utilizing filters and color palettes.', 'duration': 291.033, 'highlights': ['Created a tree map visualization to display the top 10 countries with the highest number of deaths, with US, Brazil, Mexico, India, and United Kingdom being the countries with the highest reported deaths. Top 10 countries based on the total number of deaths.', 'Changed the color palette of the tree map visualization to gold purple diverging and red, green, white diverging to enhance visual representation. Modified the color palette for visual enhancement.', 'Generated a packed bubble chart to visualize the top 10 countries with the highest average number of recoveries, with India, Brazil, and Turkey being the countries with the highest average recoveries. Displayed top 10 countries based on the average number of recoveries.']}, {'end': 5342.142, 'start': 5044.663, 'title': 'Visual analysis of covid-19 data', 'summary': 'Explains the creation of scatter plots to analyze the total confirmed cases versus the total number of deaths, filtering the top 10 countries with the highest number of confirmed cases and deaths, and visualizing the mortality rate for each country.', 'duration': 297.479, 'highlights': ['The scatter plot shows the top 10 countries with the highest number of confirmed cases and deaths, with the US leading in both categories, followed by Brazil, Mexico, and India. Top 10 countries with the highest number of confirmed cases and deaths.', 'Visualization of mortality rates reveals countries with the highest mortality rate, such as Yemen, Mexico, Syria, and Sudan. Countries with the highest mortality rates.', 'The data was last updated in March, indicating it is not recent. Data last updated in March.']}, {'end': 5708.69, 'start': 5342.142, 'title': 'Creating dual axis chart in tableau', 'summary': 'Demonstrates the process of creating a dual axis chart in tableau, showcasing recovered and death cases with values in millions and thousands, formatting labels to display values in thousands, and building a final dashboard with visualizations and filters.', 'duration': 366.548, 'highlights': ['The dual axis chart in Tableau showcases recovered and death cases with values in millions and thousands, allowing for synchronization of axis values if required.', 'The process includes formatting labels to display values in thousands with a custom decimal places of zero, resulting in clearer representation of data such as 531k, 193k, 158k.', 'The final dashboard created in Tableau includes visualizations for global cases, top 10 confirmed cases, death cases, confirmed versus deaths, and the dual axis chart, with the option to customize further by adding filters.', 'The demonstration involves selecting a palette, applying it, and adjusting the visualization elements such as changing the recovered line to bars, altering colors, and synchronizing axis for a cohesive and visually appealing dashboard.']}], 'duration': 1831.511, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs3877179.jpg', 'highlights': ["Creating reports and visualizations using Tableau Public edition for 192 countries' COVID-19 data.", 'Sorting and converting tables into horizontal bar graphs to visualize top countries with highest confirmed COVID-19 cases.', 'Visualizing COVID-19 cases using global maps, tree maps, and packed bubble charts for varied data sets.', 'Utilizing scatter plots to display top countries with highest confirmed cases, deaths, and mortality rates.', 'Creating dual axis charts in Tableau to showcase recovered and death cases with synchronized axis values.', 'Creating a final dashboard in Tableau with visualizations for global cases, top 10 confirmed cases, death cases, and dual axis chart.']}, {'end': 6155.198, 'segs': [{'end': 5854.359, 'src': 'embed', 'start': 5786.198, 'weight': 1, 'content': [{'end': 5791.12, 'text': 'The first modern Olympics took place in Athens, Greece in 1896.', 'start': 5786.198, 'duration': 4.922}, {'end': 5801.825, 'text': 'As per National Geographic, the original Olympics took place in 776 BC, so they began as part of an ancient Greek festival which celebrated Zeus,', 'start': 5791.12, 'duration': 10.705}, {'end': 5803.266, 'text': 'the Greek god of sky and weather.', 'start': 5801.825, 'duration': 1.441}, {'end': 5811.341, 'text': 'The rings in the Olympics logo represent the five continents, Europe, Africa, Asia, the Americas and Oceania.', 'start': 5804.298, 'duration': 7.043}, {'end': 5820.304, 'text': 'From 1924 to 1992, the Winter and the Summer Olympics took place in the same year but now they alternate every two years.', 'start': 5812.321, 'duration': 7.983}, {'end': 5824.085, 'text': 'Before I move on, here is an interesting question for you.', 'start': 5821.404, 'duration': 2.681}, {'end': 5828.967, 'text': 'Only two people have ever won gold medals at both the Summer and the Winter Olympics.', 'start': 5824.826, 'duration': 4.141}, {'end': 5834.235, 'text': 'Who are those two people? Please share your answers in the comment section of the video.', 'start': 5830.047, 'duration': 4.188}, {'end': 5835.395, 'text': 'We would like to know from you.', 'start': 5834.395, 'duration': 1}, {'end': 5843.657, 'text': 'The Summer Olympics in Tokyo began on the 23rd of July and recently concluded on the 8th of August.', 'start': 5837.935, 'duration': 5.722}, {'end': 5851.558, 'text': 'We got to witness some thriller matches that went down to the wire, some amazing victories and sadly there were a lot of heartbreaks as well.', 'start': 5844.437, 'duration': 7.121}, {'end': 5854.359, 'text': 'Winning and losing are part and parcel of any game.', 'start': 5852.258, 'duration': 2.101}], 'summary': 'The modern olympics began in 1896, representing 5 continents. summer olympics in tokyo ended on 8th august.', 'duration': 68.161, 'max_score': 5786.198, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs5786198.jpg'}, {'end': 6099.516, 'src': 'embed', 'start': 6018.713, 'weight': 0, 'content': [{'end': 6022.715, 'text': 'So the results that you will see in the demo is purely based on the data that we have collected.', 'start': 6018.713, 'duration': 4.002}, {'end': 6034.039, 'text': 'So the file athlete underscore events dot csv contains nearly 271116 rows of information and there are 15 columns.', 'start': 6024.675, 'duration': 9.364}, {'end': 6039.942, 'text': 'So each row corresponds to an individual athlete competing in an individual Olympic event.', 'start': 6035.22, 'duration': 4.722}, {'end': 6045.019, 'text': 'so here id is actually a unique number for each athlete.', 'start': 6040.896, 'duration': 4.123}, {'end': 6048.182, 'text': "then we have the name, which is basically the athlete's name.", 'start': 6045.019, 'duration': 3.163}, {'end': 6051.725, 'text': 'we have the sex or the gender, which is male, or f for female.', 'start': 6048.182, 'duration': 3.543}, {'end': 6055.781, 'text': 'Then we have the age of the athlete, which is in terms of integers.', 'start': 6052.679, 'duration': 3.102}, {'end': 6060.264, 'text': 'We have the height in centimeters of the athletes.', 'start': 6056.822, 'duration': 3.442}, {'end': 6062.105, 'text': 'Then we have the weight in kilograms.', 'start': 6060.604, 'duration': 1.501}, {'end': 6063.766, 'text': 'We have the team name.', 'start': 6062.785, 'duration': 0.981}, {'end': 6065.247, 'text': 'These are the country names.', 'start': 6063.786, 'duration': 1.461}, {'end': 6066.968, 'text': 'We have China, Denmark, Netherlands.', 'start': 6065.267, 'duration': 1.701}, {'end': 6069.989, 'text': 'There are around 200 country names.', 'start': 6067.568, 'duration': 2.421}, {'end': 6071.851, 'text': 'Then we have the NOC.', 'start': 6070.85, 'duration': 1.001}, {'end': 6077.534, 'text': 'As I said NOC is a three letter code that stands for National Olympic Committee.', 'start': 6071.891, 'duration': 5.643}, {'end': 6081.269, 'text': 'we have the games.', 'start': 6079.909, 'duration': 1.36}, {'end': 6084.11, 'text': 'this games contains the year and the season.', 'start': 6081.269, 'duration': 2.841}, {'end': 6087.152, 'text': 'you can see here 1992, summer.', 'start': 6084.11, 'duration': 3.042}, {'end': 6090.433, 'text': 'you also have the winter Olympic information.', 'start': 6087.152, 'duration': 3.281}, {'end': 6091.833, 'text': 'all right.', 'start': 6090.433, 'duration': 1.4}, {'end': 6099.516, 'text': 'then we have a specific year column to tell you in which year this event had occurred.', 'start': 6091.833, 'duration': 7.683}], 'summary': 'Demo data: 271116 rows, 15 columns, 200 country names, 1992 summer and winter games', 'duration': 80.803, 'max_score': 6018.713, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs6018713.jpg'}], 'start': 5708.69, 'title': 'Olympics data analysis', 'summary': 'Covers the history of olympics, the recent tokyo olympics, and a project to perform exploratory data analysis using python, including specific python libraries and visualization techniques. it also discusses the use of olympics datasets, including athlete_events and noc_regions, with the athlete_events dataset containing 271116 rows and 15 columns, detailing individual athlete information such as gender, age, height, weight, team name, noc, games, year, season, city, sport, event, and medal.', 'chapters': [{'end': 5896.38, 'start': 5708.69, 'title': 'Olympics data analysis', 'summary': 'Covers the history of olympics, the recent tokyo olympics, and a project to perform exploratory data analysis using python, including specific python libraries and visualization techniques.', 'duration': 187.69, 'highlights': ['The Summer Olympics in Tokyo began on the 23rd of July and recently concluded on the 8th of August, witnessing thrilling matches and global participation.', 'The first modern Olympics took place in Athens, Greece in 1896, and the original Olympics took place in 776 BC as part of an ancient Greek festival celebrating Zeus.', 'From 1924 to 1992, the Winter and the Summer Olympics took place in the same year, but now they alternate every two years.']}, {'end': 6155.198, 'start': 5897.621, 'title': 'First olympics with all female athletes', 'summary': 'Discusses the use of olympics datasets, including athlete_events and noc_regions, with the athlete_events dataset containing 271116 rows and 15 columns, detailing individual athlete information such as gender, age, height, weight, team name, noc, games, year, season, city, sport, event, and medal.', 'duration': 257.577, 'highlights': ['The athlete_events dataset contains 271116 rows and 15 columns. The dataset is substantial, providing a large amount of information for analysis.', 'It includes details such as gender, age, height, weight, team name, NOC, games, year, season, city, sport, event, and medal. The dataset encompasses comprehensive information about individual athletes and their performances.', 'NOC stands for National Olympic Committee, providing a three-letter code for each country. The explanation clarifies the meaning and purpose of the NOC abbreviation, adding context to the dataset.']}], 'duration': 446.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs5708690.jpg', 'highlights': ['The athlete_events dataset contains 271116 rows and 15 columns, providing substantial information for analysis.', 'The Summer Olympics in Tokyo began on the 23rd of July and recently concluded on the 8th of August, witnessing global participation.', 'The first modern Olympics took place in Athens, Greece in 1896, and the original Olympics took place in 776 BC as part of an ancient Greek festival celebrating Zeus.', 'From 1924 to 1992, the Winter and the Summer Olympics took place in the same year, but now they alternate every two years.', 'The dataset encompasses comprehensive information about individual athletes and their performances, including gender, age, height, weight, team name, NOC, games, year, season, city, sport, event, and medal.', 'NOC stands for National Olympic Committee, providing a three-letter code for each country, adding context to the dataset.']}, {'end': 7542.297, 'segs': [{'end': 6208.784, 'src': 'embed', 'start': 6155.198, 'weight': 0, 'content': [{'end': 6158.301, 'text': 'means the athlete did not win any medals.', 'start': 6155.198, 'duration': 3.103}, {'end': 6160.563, 'text': "so we'll use these two data sets.", 'start': 6158.301, 'duration': 2.262}, {'end': 6162.805, 'text': "now let's get started with the demo.", 'start': 6160.563, 'duration': 2.242}, {'end': 6165.844, 'text': "we'll be using Jupyter notebook for our analysis.", 'start': 6163.703, 'duration': 2.141}, {'end': 6168.105, 'text': "So I'll take you to my Jupyter notebook right away.", 'start': 6166.244, 'duration': 1.861}, {'end': 6170.727, 'text': "I've opened it on Chrome.", 'start': 6169.646, 'duration': 1.081}, {'end': 6173.688, 'text': 'So this is my Jupyter notebook that we are going to use.', 'start': 6171.067, 'duration': 2.621}, {'end': 6176.069, 'text': 'You can see here Olympics dataset analysis.', 'start': 6173.748, 'duration': 2.321}, {'end': 6183.973, 'text': 'I have already have a few cells that have been filled with some piece of code.', 'start': 6176.85, 'duration': 7.123}, {'end': 6188.856, 'text': 'And you can see there are some comments written as well.', 'start': 6185.134, 'duration': 3.722}, {'end': 6190.217, 'text': 'We are going to use this dataset.', 'start': 6188.896, 'duration': 1.321}, {'end': 6192.398, 'text': "So let's get started.", 'start': 6191.477, 'duration': 0.921}, {'end': 6195.854, 'text': "first and foremost we'll import the datasets.", 'start': 6193.752, 'duration': 2.102}, {'end': 6201.698, 'text': 'so we are going to use numpy, pandas, matplotlib and seaborn.', 'start': 6195.854, 'duration': 5.844}, {'end': 6208.784, 'text': 'let me hit shift enter to import all the libraries all right now.', 'start': 6201.698, 'duration': 7.086}], 'summary': 'An analysis using jupyter notebook with olympics dataset.', 'duration': 53.586, 'max_score': 6155.198, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs6155198.jpg'}, {'end': 6572.971, 'src': 'embed', 'start': 6539.201, 'weight': 2, 'content': [{'end': 6542.823, 'text': "I'll select athletes underscore df.", 'start': 6539.201, 'duration': 3.622}, {'end': 6545.605, 'text': "I'll use the shape attribute.", 'start': 6542.823, 'duration': 2.782}, {'end': 6550.263, 'text': "let's run it and see the result.", 'start': 6547.662, 'duration': 2.601}, {'end': 6551.243, 'text': 'you can see here.', 'start': 6550.263, 'duration': 0.98}, {'end': 6554.344, 'text': 'it gives me the total number of rows.', 'start': 6551.243, 'duration': 3.101}, {'end': 6562.187, 'text': 'so 2,71, 116 rows and earlier we had 15 columns in the first data set and now that we have added two more columns,', 'start': 6554.344, 'duration': 7.843}, {'end': 6568.629, 'text': 'so the total number of columns becomes 17 now, cool.', 'start': 6562.187, 'duration': 6.442}, {'end': 6572.971, 'text': "so now it's time to make the column names consistent.", 'start': 6568.629, 'duration': 4.342}], 'summary': 'Using the shape attribute, 271,116 rows and 17 columns were obtained, and the column names were made consistent.', 'duration': 33.77, 'max_score': 6539.201, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs6539201.jpg'}, {'end': 6802.134, 'src': 'embed', 'start': 6775.372, 'weight': 3, 'content': [{'end': 6783.776, 'text': 'and then we have the 25th percentile, the 50th percentile and the 75th percentile value of the columns ID, age, height, weight and year.', 'start': 6775.372, 'duration': 8.404}, {'end': 6793.707, 'text': 'alright now, one thing to notice here is, if you see the year column, the minimum year is 1896.', 'start': 6785.441, 'duration': 8.266}, {'end': 6801.393, 'text': 'so this is when Olympics started and, until recently, the Rio Olympics that was held in 2016.', 'start': 6793.707, 'duration': 7.686}, {'end': 6802.134, 'text': 'alright now, moving ahead.', 'start': 6801.393, 'duration': 0.741}], 'summary': 'Data includes 25th, 50th, and 75th percentiles for id, age, height, and weight; the year range spans from 1896 to 2016.', 'duration': 26.762, 'max_score': 6775.372, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs6775372.jpg'}, {'end': 6905.36, 'src': 'embed', 'start': 6872.937, 'weight': 4, 'content': [{'end': 6880.746, 'text': 'so if you mark here clearly so there are nearly six columns where we have missing values.', 'start': 6872.937, 'duration': 7.809}, {'end': 6888.81, 'text': 'you have age, height, weight, medal, region and notes, columns that have missing values.', 'start': 6880.746, 'duration': 8.064}, {'end': 6891.392, 'text': 'so hence it has given us true.', 'start': 6888.81, 'duration': 2.582}, {'end': 6895.174, 'text': 'the rest of the columns do not have any nan or missing values.', 'start': 6891.392, 'duration': 3.782}, {'end': 6897.275, 'text': 'hence they are false.', 'start': 6895.174, 'duration': 2.101}, {'end': 6905.36, 'text': 'all right, let me scroll down now.', 'start': 6897.275, 'duration': 8.085}], 'summary': 'There are nearly six columns with missing values in the dataset.', 'duration': 32.423, 'max_score': 6872.937, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs6872937.jpg'}, {'end': 7175.9, 'src': 'embed', 'start': 7139.437, 'weight': 5, 'content': [{'end': 7150.003, 'text': 'so if you see here you can see all the details are for the athletes who are from Japan, all right now.', 'start': 7139.437, 'duration': 10.566}, {'end': 7159.115, 'text': 'moving ahead, Now I want to know the top 10 countries who have participated since the inception of Olympics in 1896.', 'start': 7150.003, 'duration': 9.112}, {'end': 7165.757, 'text': 'So for that I will create a variable called top underscore 10 underscore countries.', 'start': 7159.115, 'duration': 6.642}, {'end': 7167.718, 'text': 'I will say equal to.', 'start': 7165.757, 'duration': 1.961}, {'end': 7175.9, 'text': 'I will use my data frame, that is, athletes, underscore, tf.', 'start': 7167.718, 'duration': 8.182}], 'summary': 'Analyzing details of japanese athletes and top 10 olympic participating countries.', 'duration': 36.463, 'max_score': 7139.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs7139437.jpg'}], 'start': 6155.198, 'title': 'Olympics data analysis and visualization', 'summary': 'Illustrates the process of importing and merging two datasets using pandas in a jupyter notebook for olympics data analysis, showing the use of libraries like numpy, pandas, matplotlib, and seaborn. it also covers the addition of columns, renaming inconsistent column names, checking for null values, and visualizing data through bar plots and histograms, with insights on the top 10 countries with the most olympic participants.', 'chapters': [{'end': 6474.084, 'start': 6155.198, 'title': 'Olympics data analysis', 'summary': 'Illustrates the process of importing and merging two datasets using pandas in a jupyter notebook for olympics data analysis, showing the use of libraries like numpy, pandas, matplotlib, and seaborn.', 'duration': 318.886, 'highlights': ['The process of importing and merging two datasets using Pandas in a Jupyter notebook for Olympics data analysis. The chapter demonstrates the steps of importing and merging two datasets using Pandas in a Jupyter notebook for Olympics data analysis.', 'Use of libraries like NumPy, Pandas, Matplotlib, and Seaborn for the analysis. The analysis involves the use of libraries such as NumPy, Pandas, Matplotlib, and Seaborn.']}, {'end': 6802.134, 'start': 6474.084, 'title': 'Data frame transformation and analysis', 'summary': 'Covers the addition of columns to a data set, renaming inconsistent column names, displaying data frame shape and information, and using the describe method to calculate statistical information, including the minimum and maximum values of the year column, ranging from 1896 to 2016.', 'duration': 328.05, 'highlights': ['The total number of rows in the data frame is 2,71,116, and after adding two more columns, the total number of columns becomes 17.', 'Using the describe method, statistical information like the minimum and maximum values of the year column, ranging from 1896 to 2016, can be obtained.', 'The info method provides details about the data frame, including the total number of columns, the number of rows or entries, non-null values, data types of the columns, and memory usage.', 'The rename function is used to make the column names consistent, ensuring that all column names start with a capital letter, and the changes are reflected in the data frame.']}, {'end': 6980.111, 'start': 6806.82, 'title': 'Checking for null values in dataset', 'summary': 'Covers the process of checking for null values in a dataset, identifying six columns with missing values, and quantifying the null values in the age, height, and weight columns.', 'duration': 173.291, 'highlights': ['Identified six columns with missing values: age, height, weight, medal, region, and notes, with 6 columns showing true for missing values.', 'Found 9474 null values in the age column, followed by null values in the height and weight columns.', 'Highlighted that the medal column may have null values for athletes who did not win any medals.']}, {'end': 7542.297, 'start': 6980.111, 'title': 'Data analysis and visualization', 'summary': 'Covers identifying and printing column names with null values, filtering data for specific countries, listing the top 10 countries with the most olympic participants, and visualizing the data through a bar plot and age distribution histogram.', 'duration': 562.186, 'highlights': ["Listing the top 10 countries with the most Olympic participants The instructor creates a variable 'top_10_countries' using the value_counts function and displays the top 10 countries with the most participants since 1896, with the United States having the highest number.", 'Filtering data for specific countries The instructor demonstrates how to filter data to display the details of athletes from specific countries, such as India and Japan, using the query function.', 'Identifying and printing column names with null values The instructor mentions identifying and printing the six columns with null values and emphasizes the need to print these column names in the form of a list.', 'Visualizing the age distribution of athletes through a histogram The instructor uses the matplotlib library to create a histogram, displaying the age distribution of athletes from the dataset, with bins ranging from 10 to 80 and a color palette of orange with white edge color.', 'Creating a bar plot to visualize the top 10 countries with the most Olympic participants The instructor creates a bar plot using the seaborn library, visually representing the top 10 countries with the highest participation in the Olympics, with the United States leading the count.']}], 'duration': 1387.099, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs6155198.jpg', 'highlights': ['The process of importing and merging two datasets using Pandas in a Jupyter notebook for Olympics data analysis.', 'Use of libraries like NumPy, Pandas, Matplotlib, and Seaborn for the analysis.', 'The total number of rows in the data frame is 2,71,116, and after adding two more columns, the total number of columns becomes 17.', 'Using the describe method, statistical information like the minimum and maximum values of the year column, ranging from 1896 to 2016, can be obtained.', 'Identified six columns with missing values: age, height, weight, medal, region, and notes, with 6 columns showing true for missing values.', "Listing the top 10 countries with the most Olympic participants The instructor creates a variable 'top_10_countries' using the value_counts function and displays the top 10 countries with the most participants since 1896, with the United States having the highest number."]}, {'end': 9453.063, 'segs': [{'end': 7602.519, 'src': 'embed', 'start': 7572.637, 'weight': 0, 'content': [{'end': 7580.003, 'text': 'you can see here so, early 20s, we have maximum number of athletes participating in the Olympics.', 'start': 7572.637, 'duration': 7.366}, {'end': 7584.246, 'text': 'we also have a few athletes who are beyond 40 years of age.', 'start': 7580.003, 'duration': 4.243}, {'end': 7595.714, 'text': 'you can see here we have a few athletes even closer to 60 also, and similarly we have a few athletes who are under 18 years of age.', 'start': 7584.246, 'duration': 11.468}, {'end': 7599.537, 'text': 'all right now, you can see here the bins.', 'start': 7595.714, 'duration': 3.823}, {'end': 7602.519, 'text': 'we had taken orange color.', 'start': 7599.537, 'duration': 2.982}], 'summary': 'In the olympics, there is a diverse age range of athletes, with the highest participation in early 20s and a notable presence of both younger and older athletes.', 'duration': 29.882, 'max_score': 7572.637, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs7572637.jpg'}, {'end': 8070.911, 'src': 'embed', 'start': 8043.045, 'weight': 1, 'content': [{'end': 8047.146, 'text': "now we'll run it and see the result.", 'start': 8043.045, 'duration': 4.101}, {'end': 8048.167, 'text': 'there you go.', 'start': 8047.146, 'duration': 1.021}, {'end': 8055.258, 'text': 'so you can see here from 1900, 1904, all these years, you can see the female participation.', 'start': 8048.167, 'duration': 7.091}, {'end': 8062.027, 'text': 'let me change it to tail so that we have the recent data of the olympics.', 'start': 8056.184, 'duration': 5.843}, {'end': 8065.168, 'text': 'you can see it here for the beijing olympics, 5816 women athletes participated.', 'start': 8062.027, 'duration': 3.141}, {'end': 8067.249, 'text': 'in 2012, london olympics, we had 5815 olympics.', 'start': 8065.168, 'duration': 2.081}, {'end': 8070.911, 'text': 'similarly, for the 2016 rio olympics, we had more participation than the london olympics.', 'start': 8067.249, 'duration': 3.662}], 'summary': 'Female olympic participation increased from 1900 to 2016, with 5816 women in beijing, 5815 in london, and more in rio.', 'duration': 27.866, 'max_score': 8043.045, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs8043045.jpg'}, {'end': 8861.776, 'src': 'embed', 'start': 8825.234, 'weight': 2, 'content': [{'end': 8834.041, 'text': 'so in geo olympics, united states secured the most number of gold medals.', 'start': 8825.234, 'duration': 8.807}, {'end': 8840.246, 'text': 'now the reason this is 137 is we have also counted the team events, for example basketball.', 'start': 8834.041, 'duration': 6.205}, {'end': 8847.838, 'text': 'similarly, great britain had 64 gold medals in total, russia 50.', 'start': 8841.491, 'duration': 6.347}, {'end': 8849.52, 'text': 'we have brazil 34, argentina 21, france 20 and japan 17..', 'start': 8847.838, 'duration': 1.682}, {'end': 8861.776, 'text': 'all. right now, using the above result, we are going to create a bar plot.', 'start': 8849.52, 'duration': 12.256}], 'summary': 'In the geo olympics, the united states won 137 gold medals, with other countries like great britain, russia, brazil, argentina, france, and japan also achieving notable medal counts.', 'duration': 36.542, 'max_score': 8825.234, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs8825234.jpg'}, {'end': 9187.661, 'src': 'embed', 'start': 9158.823, 'weight': 3, 'content': [{'end': 9164.286, 'text': 'so if i scroll down, you can see here we have a nice scatter plot.', 'start': 9158.823, 'duration': 5.463}, {'end': 9169.19, 'text': 'on the top you have the title of the plot saying height versus weight of olympic medalists.', 'start': 9164.286, 'duration': 4.904}, {'end': 9173.132, 'text': 'you can see our age as the hue.', 'start': 9169.19, 'duration': 3.942}, {'end': 9179.376, 'text': 'so blue points are for the male athletes and orange points are for the female athletes.', 'start': 9173.132, 'duration': 6.244}, {'end': 9187.661, 'text': 'and on the y-axis we have the weight in terms of kilograms and you have the height in terms of centimeters.', 'start': 9179.376, 'duration': 8.285}], 'summary': 'Scatter plot comparing height and weight of olympic medalists by gender and age.', 'duration': 28.838, 'max_score': 9158.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9158823.jpg'}, {'end': 9246.939, 'src': 'embed', 'start': 9219.147, 'weight': 4, 'content': [{'end': 9223.772, 'text': 'Indian Premier League that is IPL is one of the most popular and exciting events in world cricket.', 'start': 9219.147, 'duration': 4.625}, {'end': 9227.296, 'text': 'IPL 2022 will begin on 26th of March.', 'start': 9224.653, 'duration': 2.643}, {'end': 9235.945, 'text': "So it's the best time for us and our viewers to learn about how to perform exploratory data analysis using the recently held IPL auction dataset.", 'start': 9227.816, 'duration': 8.129}, {'end': 9243.655, 'text': 'So in this video, all the results and insights that you will see are totally based on the data we have collected from Kegel.', 'start': 9237.028, 'duration': 6.627}, {'end': 9246.939, 'text': 'We will leave a link to the dataset in the description of the video.', 'start': 9244.216, 'duration': 2.723}], 'summary': 'Ipl 2022 starts on 26th march, exploring data insights from recent auction.', 'duration': 27.792, 'max_score': 9219.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9219147.jpg'}, {'end': 9318.68, 'src': 'embed', 'start': 9288.036, 'weight': 5, 'content': [{'end': 9292.079, 'text': 'so here you can see I am importing the pandas library, the numpy library.', 'start': 9288.036, 'duration': 4.043}, {'end': 9294.461, 'text': 'these are for data manipulation and numerical computation.', 'start': 9292.079, 'duration': 2.382}, {'end': 9298.244, 'text': 'Then we have CBON and Matplotlib for data visualization.', 'start': 9295.021, 'duration': 3.223}, {'end': 9309.633, 'text': 'Then I have imported this warnings module so that we do not get any warnings while running the code.', 'start': 9298.924, 'duration': 10.709}, {'end': 9314.096, 'text': 'Cool So let me just hit shift enter to run the first cell.', 'start': 9310.233, 'duration': 3.863}, {'end': 9318.68, 'text': 'So here we will import all the libraries.', 'start': 9315.417, 'duration': 3.263}], 'summary': 'Importing pandas, numpy, cbon, and matplotlib for data manipulation, computation, and visualization, while avoiding warnings.', 'duration': 30.644, 'max_score': 9288.036, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9288036.jpg'}], 'start': 7542.297, 'title': 'Olympic data analysis', 'summary': "Analyzes athletes' age distribution, gender analysis, gold medalists, 2016 rio olympics data, filtering and visualizing medalist data, and ipl 2022 dataset. it includes insights such as the majority of athletes being in their early 20s, more male participants than female, and usa leading in gold medals.", 'chapters': [{'end': 7713.704, 'start': 7542.297, 'title': "Analysis of athletes' age distribution", 'summary': "Presents a histogram illustrating the distribution of athletes' ages, showing that the majority of athletes are in their early 20s, with the most participants aged between 20 to 30. it also compares the number of sports in the summer and winter olympics, revealing more sports in the summer olympics.", 'duration': 171.407, 'highlights': ['The majority of athletes are in their early 20s, with the most participants aged between 20 to 30. The histogram shows that the highest number of athletes fall within the age range of 20 to 30, indicating a significant concentration of athletes in their early 20s.', 'Comparison of sports in the Summer and Winter Olympics reveals more sports in the Summer Olympics. The analysis of the unique sports in the Summer and Winter Olympics indicates that there are more sports in the Summer Olympics compared to the Winter Olympics.', 'The Winter Olympic Games are held once every four years for sports practiced on snow and ice. The frequency of the Winter Olympic Games is highlighted, occurring once every four years and featuring sports practiced on snow and ice.']}, {'end': 8227.539, 'start': 7713.704, 'title': 'Olympics gender analysis', 'summary': 'Analyzes the gender distribution of athletes in the olympics from 1896 to 2016, finding that there are more male participants than female participants, and showcases the trends in female athlete participation and medal wins through count plots and line graphs.', 'duration': 513.835, 'highlights': ['The gender distribution of athletes in the Olympics shows more male participants than female participants. Since the inception of Olympics, there have been more male participants than female participants.', 'A pie chart illustrates the distribution of male and female participation, with 72.5% male and 27.5% female athletes. The pie chart visually represents the gender distribution, with 72.5% male and 27.5% female participation.', 'The analysis shows the trends in female athlete participation over time, indicating a continuous increase since 1980. A line graph demonstrates the trend of female athlete participation over time, showing a continuous increase since 1980.', 'The count plot reveals that 2016 had the highest number of female athlete participation. The count plot displays the highest number of female athlete participation in 2016.']}, {'end': 8691.72, 'start': 8227.539, 'title': 'Analyzing gold medal athletes', 'summary': 'Analyzes the data to identify athletes who have won gold medals, with a focus on those over 60 years old, and provides insights on the countries with the most gold medals, with usa leading with the highest count.', 'duration': 464.181, 'highlights': ['A total of six athletes have won a gold medal at an age exceeding 60 years, showcasing the rarity of such achievements.', 'USA has secured the most number of gold medals, followed by Russia, Germany, UK, and Italy.', 'A count plot visualizes the distribution of gold medals for athletes over 60 years, with archery having the highest count at three medals.']}, {'end': 8916.91, 'start': 8691.72, 'title': '2016 rio olympics data analysis', 'summary': "Analyzes the medal distribution of the 2016 rio summer olympics, highlighting the top countries with the most gold medals and creating visualizations of the distribution of medal-winning athletes' height and weight.", 'duration': 225.19, 'highlights': ['The United States secured 137 gold medals, followed by Great Britain with 64 and Russia with 50, showcasing the top-performing countries in the 2016 Rio Olympics.', 'A horizontal bar plot was created to display the top 20 nations with the most gold medals in the 2016 Rio Olympics, providing a visual representation of the medal distribution.', 'A scatter plot was generated to visualize the height and weight of male and female athletes who won medals in the 2016 Rio Olympics, allowing for a comparison of physical attributes among medal-winning athletes.']}, {'end': 9235.945, 'start': 8916.91, 'title': 'Filtering and visualizing olympic medalist data', 'summary': 'Demonstrates filtering data for olympic athletes who have won a medal, creating a scatter plot to visualize the height and weight of medalists, and concludes with an introduction to exploratory data analysis using the ipl auction dataset.', 'duration': 319.035, 'highlights': ['Creating a scatter plot to visualize the height and weight of Olympic medalists The chapter demonstrates the creation of a scatter plot to visualize the height and weight of Olympic medalists, filtering data to include only athletes who have won a medal, and using hue to differentiate between male and female athletes.', 'Introduction to exploratory data analysis using the IPL auction dataset The chapter concludes with an introduction to exploratory data analysis using the IPL auction dataset, highlighting the upcoming IPL 2022 and its significance in world cricket.', 'Filtering data for athletes who have won a medal The chapter discusses filtering data to include only athletes who have won a medal, addressing the presence of null values in the medals column, and creating a variable to filter out null medal values.']}, {'end': 9453.063, 'start': 9237.028, 'title': 'Analyzing ipl 2022 dataset using python and jupyter notebook', 'summary': 'Involves analyzing the ipl 2022 dataset using python and jupyter notebook, including importing libraries, loading the dataset, and exploring key columns and player information.', 'duration': 216.035, 'highlights': ['Importing libraries for data manipulation, numerical computation, and data visualization using Python and Jupyter Notebook. The speaker imports pandas, numpy, CBON, Matplotlib, and warnings module for data manipulation, numerical computation, and data visualization.', 'Loading the IPL 2022 dataset using the read_csv function from the pandas library. The speaker loads the IPL 2022 dataset using the read_csv function from the pandas library, specifying the dataset location and name.', 'Exploring the dataset using the head function to display the columns and player information, identifying unnecessary columns and key player attributes such as base price, player type, and playing history. The speaker uses the head function to display the dataset columns, identifies unnecessary columns, and highlights key player attributes such as base price, player type, and playing history.']}], 'duration': 1910.766, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs7542297.jpg', 'highlights': ['The majority of athletes are in their early 20s, with the most participants aged between 20 to 30.', 'More male participants than female in the Olympics, with 72.5% male and 27.5% female athletes.', 'USA leading in gold medals with 137, followed by Great Britain with 64 and Russia with 50 in the 2016 Rio Olympics.', 'A scatter plot was generated to visualize the height and weight of male and female athletes who won medals in the 2016 Rio Olympics.', 'Introduction to exploratory data analysis using the IPL auction dataset for IPL 2022 and its significance in world cricket.', 'Importing libraries for data manipulation, numerical computation, and data visualization using Python and Jupyter Notebook.']}, {'end': 10043.757, 'segs': [{'end': 9479.84, 'src': 'embed', 'start': 9453.063, 'weight': 3, 'content': [{'end': 9459.546, 'text': "now let's see the total number of players that are present in the data set and the total columns that are there.", 'start': 9453.063, 'duration': 6.483}, {'end': 9465.622, 'text': "for this I am going to use the shape attribute, so I'll write IPL dot shape.", 'start': 9459.546, 'duration': 6.076}, {'end': 9473.992, 'text': 'you can see here there are total 633 players that were part of the auction, and this includes the players who were draft pick also,', 'start': 9465.622, 'duration': 8.37}, {'end': 9479.84, 'text': 'and there are total 8 columns in my data frame IPL.', 'start': 9473.992, 'duration': 5.848}], 'summary': '633 players in dataset, 8 columns in ipl data frame.', 'duration': 26.777, 'max_score': 9453.063, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9453063.jpg'}, {'end': 9572.526, 'src': 'embed', 'start': 9540.954, 'weight': 0, 'content': [{'end': 9548.959, 'text': 'now, as I said, the first column, which is unnamed colon 0, is actually unnecessary.', 'start': 9540.954, 'duration': 8.005}, {'end': 9550.46, 'text': 'so we are going to drop it.', 'start': 9548.959, 'duration': 1.501}, {'end': 9560.482, 'text': 'for that we are going to use the dot drop function and then I have passed in my column name, followed by the axis.', 'start': 9550.46, 'duration': 10.022}, {'end': 9562.403, 'text': 'on which axis is it there?', 'start': 9560.482, 'duration': 1.921}, {'end': 9572.526, 'text': 'and then I say in place equal to true, which means it will replace and permanently remove the unnamed column from the data frame.', 'start': 9562.403, 'duration': 10.123}], 'summary': 'Dropped unnecessary column using dot drop function, permanently removed from the dataframe.', 'duration': 31.572, 'max_score': 9540.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9540954.jpg'}, {'end': 9731.548, 'src': 'embed', 'start': 9694.87, 'weight': 1, 'content': [{'end': 9700.154, 'text': 'so for that i am going to use the fill any function, and here i passed in 0.', 'start': 9694.87, 'duration': 5.284}, {'end': 9711.313, 'text': 'so all the null values in the data set will be replaced with 0 for cost in rupees column and cost in dollars column.', 'start': 9700.154, 'duration': 11.159}, {'end': 9724.222, 'text': 'let me just run it alright, okay, now we will see the players who were unsold in the 2021 auction,', 'start': 9711.313, 'duration': 12.909}, {'end': 9731.548, 'text': 'which means we will check the rows that have null values for the 2021 squad column.', 'start': 9724.222, 'duration': 7.326}], 'summary': 'Using the fillna function with 0 to replace null values in cost columns and identifying unsold players in the 2021 auction.', 'duration': 36.678, 'max_score': 9694.87, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9694870.jpg'}, {'end': 9921.414, 'src': 'embed', 'start': 9886.063, 'weight': 2, 'content': [{'end': 9893.091, 'text': 'but before that we are going to slice our data frame.', 'start': 9886.063, 'duration': 7.028}, {'end': 9902.801, 'text': 'so here I am looking for all the rows where cost in rupees is greater than zero, and those should be unique.', 'start': 9893.091, 'duration': 9.71}, {'end': 9907.162, 'text': "and I'm storing the value in a variable called teams.", 'start': 9902.801, 'duration': 4.361}, {'end': 9915.168, 'text': 'so here you can see we have the different teams where the cost in rupees is greater than zero.', 'start': 9907.162, 'duration': 8.006}, {'end': 9921.414, 'text': "now I'm going to create another column called status.", 'start': 9915.168, 'duration': 6.246}], 'summary': 'Slicing data frame to find unique rows with cost in rupees greater than zero and creating a new column called status.', 'duration': 35.351, 'max_score': 9886.063, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9886063.jpg'}], 'start': 9453.063, 'title': 'Ipl dataset exploration and cleaning', 'summary': "Delves into the ipl dataset, revealing 633 players in the auction, 8 columns, and the process of removing an unnecessary column. it also demonstrates identifying and handling null values, adding a new 'status' column, and checking for duplicated player entries.", 'chapters': [{'end': 9599.049, 'start': 9453.063, 'title': 'Exploring ipl data', 'summary': 'Explores the ipl dataset, revealing 633 players in the auction, 8 columns in the data frame, and the process of removing an unnecessary column, as well as checking null values in the dataset.', 'duration': 145.986, 'highlights': ['The dataset comprises a total of 633 players from the auction, including draft picks, and consists of 8 columns in the data frame.', "The process of removing the unnecessary first column, 'unnamed: 0', using the dot drop function and confirming its successful removal from the data frame.", 'Using the info function to display the total non-null objects, data types, and memory usage, revealing 633 entries and 8 columns in the dataset.']}, {'end': 10043.757, 'start': 9599.049, 'title': 'Data cleaning and analysis in ipl dataset', 'summary': "Demonstrates the process of identifying and handling null values, replacing them with zeros or 'not participated', and adding a new 'status' column to the dataset based on specific conditions, while also checking for duplicated player entries in the ipl dataset.", 'duration': 444.708, 'highlights': ["All the null values in the dataset for 'cost in rupees' and 'cost in dollars' were replaced with zeros using the fillna function, resulting in 396 null values being replaced. Using the fillna function with a value of 0, all the null values in the 'cost in rupees' and 'cost in dollars' columns were successfully replaced, resulting in 396 null values being replaced.", "The '2021 squad' column was updated to replace all null values with 'not participated', ensuring that none of the columns in the IPL data frame have null values. The '2021 squad' column was updated to replace all null values with the text value 'not participated', effectively eliminating any remaining null values in the dataset.", "A new column called 'status' was added to the dataset, containing values 'sold' and 'unsold' based on the condition that the 'cost in rupees' is greater than zero, resulting in unique team values being stored in a variable called 'teams'. A new column 'status' was added to the dataset, containing values 'sold' and 'unsold' based on the condition that the 'cost in rupees' is greater than zero, resulting in unique team values being stored in a variable called 'teams'.", "The process of identifying and handling duplicated player entries in the IPL dataset was demonstrated using the 'duplicated' function, revealing instances where players with the same name are actually different individuals. The 'duplicated' function was utilized to identify and address duplicated player entries in the IPL dataset, highlighting instances where players with the same name are actually different individuals."]}], 'duration': 590.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs9453063.jpg', 'highlights': ["The process of removing the unnecessary first column, 'unnamed: 0', using the dot drop function and confirming its successful removal from the data frame.", "All the null values in the dataset for 'cost in rupees' and 'cost in dollars' were replaced with zeros using the fillna function, resulting in 396 null values being replaced.", "A new column called 'status' was added to the dataset, containing values 'sold' and 'unsold' based on the condition that the 'cost in rupees' is greater than zero, resulting in unique team values being stored in a variable called 'teams'.", 'The dataset comprises a total of 633 players from the auction, including draft picks, and consists of 8 columns in the data frame.']}, {'end': 11356.748, 'segs': [{'end': 10100.841, 'src': 'embed', 'start': 10072.648, 'weight': 0, 'content': [{'end': 10078.689, 'text': "And within square brackets, I'm going to say 0, which means it will give me the total number of rows.", 'start': 10072.648, 'duration': 6.041}, {'end': 10083.471, 'text': 'And you can see 633 players were part of the auction.', 'start': 10080.05, 'duration': 3.421}, {'end': 10086.772, 'text': 'This is as per our data that we have collected from Kaggle.', 'start': 10083.891, 'duration': 2.881}, {'end': 10093.313, 'text': 'Up next, we are going to find how many types of players have participated.', 'start': 10088.809, 'duration': 4.504}, {'end': 10100.841, 'text': "So I'm going to say types equal to say IPL.", 'start': 10094.995, 'duration': 5.846}], 'summary': 'A total of 633 players participated in the auction, sourced from kaggle.', 'duration': 28.193, 'max_score': 10072.648, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs10072648.jpg'}, {'end': 10169.351, 'src': 'embed', 'start': 10137.036, 'weight': 1, 'content': [{'end': 10137.657, 'text': "So let's run it.", 'start': 10137.036, 'duration': 0.621}, {'end': 10142.42, 'text': 'You can see here we have the different types of players.', 'start': 10138.437, 'duration': 3.983}, {'end': 10146.523, 'text': 'So all rounders, there are total 242 all rounders who are part of the auction.', 'start': 10142.6, 'duration': 3.923}, {'end': 10151.747, 'text': 'Then we have 215 bowlers, 112 batsmen, and 64 wicket keepers.', 'start': 10146.984, 'duration': 4.763}, {'end': 10163.216, 'text': "Now using the above table that we just created, I'm going to plot my first pie chart.", 'start': 10153.208, 'duration': 10.008}, {'end': 10168.351, 'text': "so for that I'm going to use the matplotlib pi function.", 'start': 10164.79, 'duration': 3.561}, {'end': 10169.351, 'text': 'so plt.pi.', 'start': 10168.351, 'duration': 1}], 'summary': '242 all rounders, 215 bowlers, 112 batsmen, and 64 wicket keepers in the auction.', 'duration': 32.315, 'max_score': 10137.036, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs10137036.jpg'}, {'end': 10388.189, 'src': 'embed', 'start': 10358.64, 'weight': 2, 'content': [{'end': 10364.242, 'text': "if I run this, you can see here I've successfully grouped the unsold and sold players.", 'start': 10358.64, 'duration': 5.602}, {'end': 10368.643, 'text': 'the values are matching with the bar plot that we just created.', 'start': 10364.242, 'duration': 4.401}, {'end': 10370.644, 'text': 'so sold were 237 players and unsold are 396.', 'start': 10368.643, 'duration': 2.001}, {'end': 10383.725, 'text': 'cool, now we are going to see the total number of players bought by each team.', 'start': 10370.644, 'duration': 13.081}, {'end': 10388.189, 'text': 'again this is going to be a count plot, so basically a bar plot.', 'start': 10383.725, 'duration': 4.464}], 'summary': 'Successfully grouped 237 sold and 396 unsold players, and creating a count plot for players bought by each team.', 'duration': 29.549, 'max_score': 10358.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs10358640.jpg'}, {'end': 11020.736, 'src': 'embed', 'start': 10956.155, 'weight': 3, 'content': [{'end': 10957.196, 'text': 'let me run it again.', 'start': 10956.155, 'duration': 1.041}, {'end': 10960.517, 'text': 'there you go, so you can see.', 'start': 10957.196, 'duration': 3.321}, {'end': 10973.284, 'text': 'Mumbai Indians have spent 15.25 crores for one of the players, and this player happens to be Shan Kishan.', 'start': 10960.517, 'duration': 12.767}, {'end': 10981.315, 'text': 'then Chennai Super Kings have spent the highest amount for A player who happens to be a bowler.', 'start': 10973.284, 'duration': 8.031}, {'end': 10982.416, 'text': 'the name is Deepak Chahar.', 'start': 10981.315, 'duration': 1.101}, {'end': 10984.337, 'text': 'they have bought him for 14 crores.', 'start': 10982.416, 'duration': 1.921}, {'end': 10991.902, 'text': 'Kolkata Knight Riders bought Shreyas Iyer for 12.25 crores and so on and so forth.', 'start': 10984.877, 'duration': 7.025}, {'end': 10997.646, 'text': 'So these are the highest amounts spent on a single player by each of the teams.', 'start': 10992.643, 'duration': 5.003}, {'end': 11000.648, 'text': 'Let me scroll down.', 'start': 10999.887, 'duration': 0.761}, {'end': 11010.13, 'text': 'now we are going to see the player retained at maximum value or maximum price.', 'start': 11003.186, 'duration': 6.944}, {'end': 11020.736, 'text': 'now, if you follow IPL, the maximum price at which a player was retained happens to be Ravindra Jadeja.', 'start': 11010.13, 'duration': 10.606}], 'summary': 'Mumbai indians spent 15.25 crores on shan kishan, chennai super kings spent 14 crores on deepak chahar, and kolkata knight riders bought shreyas iyer for 12.25 crores. ravindra jadeja was retained at the maximum price.', 'duration': 64.581, 'max_score': 10956.155, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs10956155.jpg'}, {'end': 11139.04, 'src': 'embed', 'start': 11106.484, 'weight': 5, 'content': [{'end': 11124.512, 'text': "cool, now the next cell of code will display or print the top 5 bullers who have been picked in this year's IPL based on their cost in INR.", 'start': 11106.484, 'duration': 18.028}, {'end': 11127.594, 'text': 'let me just run it and show you the result.', 'start': 11124.512, 'duration': 3.082}, {'end': 11128.314, 'text': 'you can see here.', 'start': 11127.594, 'duration': 0.72}, {'end': 11130.375, 'text': 'Deepak Chahar went for 14 crores.', 'start': 11128.314, 'duration': 2.061}, {'end': 11139.04, 'text': 'then we have Shradul Thakur, who was bought by Delhi capitals earlier he was part of CSK for 10.75 crores.', 'start': 11130.375, 'duration': 8.665}], 'summary': 'Top 5 ipl bullers bought: deepak chahar for 14 crores, shradul thakur for 10.75 crores.', 'duration': 32.556, 'max_score': 11106.484, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11106484.jpg'}, {'end': 11255.197, 'src': 'embed', 'start': 11187.745, 'weight': 6, 'content': [{'end': 11198.589, 'text': 'you can see here Shreyas Iyer was bought by Kolkata Knight Riders for 12.25 crores last year he was part of Delhi capitals.', 'start': 11187.745, 'duration': 10.844}, {'end': 11204.311, 'text': "then you have Simran Hitmayer, who's from West Indies.", 'start': 11198.589, 'duration': 5.722}, {'end': 11209.478, 'text': 'he has been picked by the Rajasthan Royals for 8.5 crores.', 'start': 11204.311, 'duration': 5.167}, {'end': 11214.866, 'text': 'then we have Rahul Tripathi, Sikha Dhawan and Devdutt Padikkal.', 'start': 11209.478, 'duration': 5.388}, {'end': 11223.033, 'text': "now, if you wanted to check for, let's say, all rounders can replace batter with all-rounder.", 'start': 11214.866, 'duration': 8.167}, {'end': 11234.078, 'text': 'if I run this, you can see Liam Livingstone has been bought by Punjab Kings this season for 11.5 crores.', 'start': 11223.033, 'duration': 11.045}, {'end': 11239.22, 'text': 'his base price was just 1 crore and last year he was part of the Rajasthan Royals.', 'start': 11234.078, 'duration': 5.142}, {'end': 11241.781, 'text': 'then we have a player from Sri Lanka, Hasaranga.', 'start': 11239.22, 'duration': 2.561}, {'end': 11255.197, 'text': 'then we have Harshal Patel, Shah Rukh Khan and Rahul highest amount cool now.', 'start': 11241.781, 'duration': 13.416}], 'summary': 'Shreyas iyer sold for 12.25 crores to kkr, liam livingstone bought for 11.5 crores by punjab kings, and simran hitmayer acquired for 8.5 crores by rajasthan royals.', 'duration': 67.452, 'max_score': 11187.745, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11187745.jpg'}, {'end': 11340.815, 'src': 'embed', 'start': 11310.41, 'weight': 8, 'content': [{'end': 11312.431, 'text': "let's say, Suresh Raina played for CSK.", 'start': 11310.41, 'duration': 2.021}, {'end': 11315.393, 'text': 'Steve Smith was part of the Delhi Capitals.', 'start': 11312.431, 'duration': 2.962}, {'end': 11318.475, 'text': 'Sakib Alhassan was part of KKR.', 'start': 11315.393, 'duration': 3.082}, {'end': 11322.177, 'text': 'if I scroll down, you have David Malan, who played for Punjab Kings in 2021.', 'start': 11318.475, 'duration': 3.702}, {'end': 11326.16, 'text': 'Owen Morgan, who played for KKR.', 'start': 11322.177, 'duration': 3.983}, {'end': 11331.671, 'text': 'he also captained KKR, but unfortunately he was not picked this season.', 'start': 11326.16, 'duration': 5.511}, {'end': 11340.815, 'text': 'Chris Lynn from Australia, also remained unsold this season and to tie, we have Ben Cutting, Aditya Tare.', 'start': 11331.671, 'duration': 9.144}], 'summary': 'Player transfers in ipl 2021: raina to csk, smith to delhi, and malan to punjab kings. morgan not picked.', 'duration': 30.405, 'max_score': 11310.41, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11310410.jpg'}], 'start': 10043.757, 'title': 'Ipl auction analysis', 'summary': 'Covers the analysis of the 2022 ipl auction, with 633 players participating, including 242 all-rounders, 215 bowlers, 112 batsmen, and 64 wicket keepers. it also details player distribution, with 17.69% batters, 10.11% wicketkeepers, 237 players sold, and 396 unsold. additionally, it involves creating new columns for retention, base price, and base price unit, and analyzing the count of retained and auction-picked players for each team.', 'chapters': [{'end': 10204.908, 'start': 10043.757, 'title': '2022 ipl auction analysis', 'summary': 'Demonstrates the analysis of the 2022 ipl auction data, revealing that 633 players participated, with 242 all-rounders, 215 bowlers, 112 batsmen, and 64 wicket keepers present in the auction.', 'duration': 161.151, 'highlights': ['633 players were part of the 2022 IPL auction. The analysis revealed a total of 633 players participating in the 2022 IPL auction, providing comprehensive data for further insights.', '242 all-rounders, 215 bowlers, 112 batsmen, and 64 wicket keepers participated in the auction. The data showcased a diverse participation with 242 all-rounders, 215 bowlers, 112 batsmen, and 64 wicket keepers present in the 2022 IPL auction, offering valuable insights into player distribution.']}, {'end': 10481.56, 'start': 10204.908, 'title': 'Ipl auction analysis', 'summary': 'Covers the analysis of player distribution in the ipl auction, revealing a majority of all-rounders and bowlers, with 17.69% batters and 10.11% wicketkeepers, and 237 players sold versus 396 unsold, while also showcasing the number of players bought by each team, with some acquiring the maximum limit of 25 players.', 'duration': 276.652, 'highlights': ['Majority of all-rounders and bowlers in the auction, with 17.69% batters and 10.11% wicketkeepers.', '237 players were sold and 396 players were unsold.', 'Some teams purchased the maximum limit of 25 players, while others acquired 24, 23, or 22 players.']}, {'end': 10766.235, 'start': 10481.56, 'title': 'Ipl data analysis: creating columns and team retention', 'summary': 'Involves creating new columns for retention, base price, and base price unit, replacing draft pick values with 0, and analyzing the count of retained and auction-picked players for each team in the 2022 ipl auction.', 'duration': 284.675, 'highlights': ['Analysing count of retained and auction-picked players for each team Using group by function, the analysis reveals that Chinese Super Kings picked 21 players from the auction and retained 4 players, while Delhi Capitals picked 20 players and retained 4.', 'Creating new columns for retention, base price, and base price unit The process involves creating columns for retention, base price, and base price unit, replacing draft pick values with 0, and splitting the base price value into two separate columns.', 'Replacing draft pick values with 0 and splitting base price value The draft pick values are replaced with 0 in the base price column, and the base price value is split into base price unit and base price columns.']}, {'end': 11356.748, 'start': 10766.235, 'title': 'Displaying ipl data analysis', 'summary': 'Demonstrates the analysis of ipl data including displaying players in each team based on different types, identifying the highest amount spent on a single player by each team, and listing the top players picked and those unsold for the 2022 season.', 'duration': 590.513, 'highlights': ["Mumbai Indians spent the highest amount of 15.25 crores for a single player, Shan Kishan, followed by Chennai Super Kings' 14 crores for Deepak Chahar, and Kolkata Knight Riders' 12.25 crores for Shreyas Iyer. Mumbai Indians spent 15.25 crores on Shan Kishan, Chennai Super Kings spent 14 crores on Deepak Chahar, and Kolkata Knight Riders spent 12.25 crores on Shreyas Iyer.", 'Ravindra Jadeja was retained for 16 crores by Chennai Super Kings, marking the highest retention price. Ravindra Jadeja was retained for 16 crores by Chennai Super Kings, marking the highest retention price.', 'The top 5 bowlers picked in the 2022 IPL included Deepak Chahar (14 crores), Shradul Thakur (10.75 crores), Prasidha Krishna (10 crores), Lockie Ferguson (10 crores), and Avesh Khan (10 crores). The top 5 bowlers picked in the 2022 IPL included Deepak Chahar (14 crores), Shradul Thakur (10.75 crores), Prasidha Krishna (10 crores), Lockie Ferguson (10 crores), and Avesh Khan (10 crores).', 'Shreyas Iyer was bought by Kolkata Knight Riders for 12.25 crores, Simran Hitmayer by Rajasthan Royals for 8.5 crores, and other batters were also listed. Shreyas Iyer was bought for 12.25 crores by Kolkata Knight Riders, Simran Hitmayer for 8.5 crores by Rajasthan Royals, and other batters were also listed.', 'Liam Livingstone was bought for 11.5 crores by Punjab Kings, Hasaranga and Harshal Patel were also among the top all-rounders picked. Liam Livingstone was bought for 11.5 crores by Punjab Kings, Hasaranga and Harshal Patel were also among the top all-rounders picked.', 'Players from the 2021 season not picked for the 2022 season included Suresh Raina, Steve Smith, Sakib Alhassan, David Malan, Owen Morgan, Chris Lynn, Ben Cutting, and others. Players from the 2021 season not picked for the 2022 season included Suresh Raina, Steve Smith, Sakib Alhassan, David Malan, Owen Morgan, Chris Lynn, Ben Cutting, and others.']}], 'duration': 1312.991, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs10043757.jpg', 'highlights': ['633 players participated in the 2022 IPL auction, providing comprehensive data for further insights.', 'Player distribution included 242 all-rounders, 215 bowlers, 112 batsmen, and 64 wicket keepers, offering valuable insights.', '237 players were sold and 396 players were unsold, showcasing the auction outcomes.', "Mumbai Indians spent the highest amount of 15.25 crores for a single player, followed by Chennai Super Kings' 14 crores for Deepak Chahar, and Kolkata Knight Riders' 12.25 crores for Shreyas Iyer.", 'Ravindra Jadeja was retained for 16 crores by Chennai Super Kings, marking the highest retention price.', 'The top 5 bowlers picked in the 2022 IPL included Deepak Chahar (14 crores), Shradul Thakur (10.75 crores), Prasidha Krishna (10 crores), Lockie Ferguson (10 crores), and Avesh Khan (10 crores).', 'Shreyas Iyer was bought for 12.25 crores by Kolkata Knight Riders, Simran Hitmayer for 8.5 crores by Rajasthan Royals, and other batters were also listed.', 'Liam Livingstone was bought for 11.5 crores by Punjab Kings, Hasaranga and Harshal Patel were also among the top all-rounders picked.', 'Players from the 2021 season not picked for the 2022 season included Suresh Raina, Steve Smith, Sakib Alhassan, David Malan, Owen Morgan, Chris Lynn, Ben Cutting, and others.']}, {'end': 14027.328, 'segs': [{'end': 11412.171, 'src': 'embed', 'start': 11385.336, 'weight': 1, 'content': [{'end': 11394.724, 'text': "In this video, we'll be using four publicly available datasets about Formula 1 that has information from 1950 to the 2019 F1 season.", 'start': 11385.336, 'duration': 9.388}, {'end': 11401.11, 'text': "We'll see how using the Python libraries, you can perform exploratory data analysis and draw valuable insights.", 'start': 11395.625, 'duration': 5.485}, {'end': 11404.533, 'text': "So let's begin by understanding what Formula 1 is.", 'start': 11401.951, 'duration': 2.582}, {'end': 11412.171, 'text': 'Formula 1 also known as F1 is a motorsport defined by open wheeled single seat race cars.', 'start': 11406.226, 'duration': 5.945}], 'summary': 'Using 4 datasets, explore formula 1 data from 1950-2019 with python for insights.', 'duration': 26.835, 'max_score': 11385.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11385336.jpg'}, {'end': 11468.847, 'src': 'embed', 'start': 11436.963, 'weight': 0, 'content': [{'end': 11438.144, 'text': 'Let me show them one by one.', 'start': 11436.963, 'duration': 1.181}, {'end': 11439.945, 'text': 'These are all CSV files.', 'start': 11438.544, 'duration': 1.401}, {'end': 11444.469, 'text': "Now, this is the first dataset that we'll be using in our demo.", 'start': 11441.767, 'duration': 2.702}, {'end': 11448.792, 'text': 'So, if you notice, the dataset does not have column names.', 'start': 11445.069, 'duration': 3.723}, {'end': 11451.214, 'text': 'In fact, none of the datasets have column names.', 'start': 11449.073, 'duration': 2.141}, {'end': 11455.678, 'text': "So, while importing the dataset, we'll assign the column names using the names parameter.", 'start': 11451.815, 'duration': 3.863}, {'end': 11458.7, 'text': "I'll show that in a while.", 'start': 11456.619, 'duration': 2.081}, {'end': 11461.823, 'text': 'Now, this is the results dataset.', 'start': 11460.182, 'duration': 1.641}, {'end': 11468.847, 'text': 'The results data set has information about the result ID.', 'start': 11463.444, 'duration': 5.403}], 'summary': 'Demo uses csv files, assigns column names while importing datasets, and results dataset contains result id.', 'duration': 31.884, 'max_score': 11436.963, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11436963.jpg'}, {'end': 12004.205, 'src': 'embed', 'start': 11950.523, 'weight': 2, 'content': [{'end': 11957.986, 'text': 'Now the next analysis is to merge all the dataset.', 'start': 11950.523, 'duration': 7.463}, {'end': 11966.348, 'text': "So here I'll create a data frame called df and I'll use pd.merge function.", 'start': 11959.966, 'duration': 6.382}, {'end': 11983.785, 'text': "First I'll use the results data frame that we just imported above this one and I'm going to merge it with races data frame.", 'start': 11967.849, 'duration': 15.936}, {'end': 11992.65, 'text': "and from the races data frame I'm going to take specific columns, like the race ID.", 'start': 11983.785, 'duration': 8.865}, {'end': 11995.832, 'text': 'then I need the year.', 'start': 11992.65, 'duration': 3.182}, {'end': 12004.205, 'text': 'after that I need name.', 'start': 11997.822, 'duration': 6.383}], 'summary': 'Merging datasets using pd.merge to create data frame df with specific columns like race id, year, and name.', 'duration': 53.682, 'max_score': 11950.523, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11950523.jpg'}, {'end': 13378.651, 'src': 'embed', 'start': 13338.272, 'weight': 3, 'content': [{'end': 13340.533, 'text': 'Then we have Stewart, Clark and Lauda.', 'start': 13338.272, 'duration': 2.261}, {'end': 13346.675, 'text': 'The data that we have is not the most recent one.', 'start': 13342.974, 'duration': 3.701}, {'end': 13356.64, 'text': 'So if you check the actual stats, Hamilton has actually overtaken Michael Schumacher and has won over 103 close to 103 races as of now.', 'start': 13347.156, 'duration': 9.484}, {'end': 13362.002, 'text': 'And next we have Michael Schumacher.', 'start': 13359.541, 'duration': 2.461}, {'end': 13363.983, 'text': 'So this is not the most recent data.', 'start': 13362.082, 'duration': 1.901}, {'end': 13378.651, 'text': "Cool Now using this data, we'll create another bar plot for the top 10 drivers.", 'start': 13365.744, 'duration': 12.907}], 'summary': 'Hamilton has won close to 103 races, overtaking schumacher.', 'duration': 40.379, 'max_score': 13338.272, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs13338272.jpg'}, {'end': 13520.091, 'src': 'embed', 'start': 13483.513, 'weight': 4, 'content': [{'end': 13488.695, 'text': "let me see if everything is fine and then we'll run this.", 'start': 13483.513, 'duration': 5.182}, {'end': 13490.216, 'text': 'there you go.', 'start': 13488.695, 'duration': 1.521}, {'end': 13501.378, 'text': 'you can see we have a very nice horizontal bar plot that shows the top drivers in F1.', 'start': 13490.216, 'duration': 11.162}, {'end': 13510.905, 'text': 'so as per our data that we have, we have Michael Schumacher, who has won over 91 or close to 91 races.', 'start': 13501.378, 'duration': 9.527}, {'end': 13515.788, 'text': 'then we have Hamilton, Vettel, Alonso, Clark and others.', 'start': 13510.905, 'duration': 4.883}, {'end': 13520.091, 'text': 'all right.', 'start': 13515.788, 'duration': 4.303}], 'summary': 'Horizontal bar plot displays top f1 drivers; schumacher won close to 91 races, followed by hamilton, vettel, alonso, clark, and others.', 'duration': 36.578, 'max_score': 13483.513, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs13483513.jpg'}, {'end': 13637.949, 'src': 'embed', 'start': 13604.525, 'weight': 5, 'content': [{'end': 13613.509, 'text': 'the manufacturing companies or the constructors who have won the most number of races are Ferrari, McLaren, Williams.', 'start': 13604.525, 'duration': 8.984}, {'end': 13621.472, 'text': 'then we have Mercedes, Red Bull, Tim Lotus, Renault and others.', 'start': 13613.509, 'duration': 7.963}, {'end': 13635.167, 'text': "okay, now we'll convert this data into a horizontal bar graph or a bar chart, So that will show the bar plot for the top 10 constructors.", 'start': 13621.472, 'duration': 13.695}, {'end': 13637.949, 'text': 'Let me show you how to do it.', 'start': 13636.228, 'duration': 1.721}], 'summary': 'Ferrari, mclaren, and williams are top constructors in racing with the most wins.', 'duration': 33.424, 'max_score': 13604.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs13604525.jpg'}, {'end': 13841.773, 'src': 'embed', 'start': 13807.836, 'weight': 6, 'content': [{'end': 13813.9, 'text': "so to do that i'm going to exclude the data points with grid 0, because it skews the data.", 'start': 13807.836, 'duration': 6.064}, {'end': 13818.683, 'text': 'so 0 means that the driver started from the pit lane.', 'start': 13813.9, 'duration': 4.783}, {'end': 13828.724, 'text': "so i'm using this function called regplot to plot my regression line.", 'start': 13818.683, 'duration': 10.041}, {'end': 13831.387, 'text': 'let me run it and show you the result.', 'start': 13828.724, 'duration': 2.663}, {'end': 13832.529, 'text': 'there you go.', 'start': 13831.387, 'duration': 1.142}, {'end': 13841.773, 'text': 'so if you see here, we can easily see there is a linear relationship between the starting and finishing position.', 'start': 13832.529, 'duration': 9.244}], 'summary': 'Excluding grid 0 data points reveals a linear relationship between starting and finishing position.', 'duration': 33.937, 'max_score': 13807.836, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs13807836.jpg'}, {'end': 13933.043, 'src': 'embed', 'start': 13908.291, 'weight': 7, 'content': [{'end': 13919.936, 'text': 'you have the average fastest speed in kilometer per hour and then you have the different GPs and you can see here the year values.', 'start': 13908.291, 'duration': 11.645}, {'end': 13929.7, 'text': 'now, In general, what we see is there is a decrease in the average fastest lap from 2004 till around 2015 for almost all the circuits.', 'start': 13919.936, 'duration': 9.764}, {'end': 13933.043, 'text': 'And again from 2015 onwards the speed started to increase.', 'start': 13929.72, 'duration': 3.323}], 'summary': 'Average fastest lap speed decreased till 2015, then increased.', 'duration': 24.752, 'max_score': 13908.291, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs13908291.jpg'}], 'start': 11356.748, 'title': 'F1 data analysis and visualization', 'summary': 'Covers the analysis and visualization of formula 1 datasets from 1950 to 2019 using python libraries, including pandas, numpy, matplotlib, and seaborn. it includes importing, assigning column names, merging, processing data, analyzing grand prix data, and visualizing f1 data with bar plots, showcasing key statistics such as wins and speeds.', 'chapters': [{'end': 11436.442, 'start': 11356.748, 'title': 'F1 data analysis 1950-2019', 'summary': "Introduces the use of python libraries for exploratory data analysis on publicly available formula 1 datasets from 1950 to 2019, highlighting the sport's key features and governance.", 'duration': 79.694, 'highlights': ['The sport of Formula 1, also known as F1, features the best drivers in the most powerful and technically advanced cars, governed by the FIA (International Automobile Federation).', 'In the 1950-2019 F1 season, 10 teams comprising of two drivers each competed in a series of races over a year to determine the ultimate driver and constructor champion.', 'Demonstration involves the use of Python libraries for performing exploratory data analysis and drawing valuable insights from the four publicly available datasets about Formula 1.', 'The chapter concludes a demo session on IPL data analysis 2022, emphasizing the learning of different functions and attributes within Python libraries for data manipulation and visualization.', 'The demo session focused on learning how to use different functions and attributes within Python libraries for data manipulation and visualization, while also addressing specific questions.']}, {'end': 11949.143, 'start': 11436.963, 'title': 'Importing and assigning column names to datasets', 'summary': 'Covers the process of importing and assigning column names to datasets, including datasets on race results, races, drivers, and constructors, using the pandas library and notepad, and the use of libraries such as numpy, pandas, matplotlib, and seaborn for f1 sports analysis.', 'duration': 512.18, 'highlights': ['The chapter covers the process of importing and assigning column names to datasets, including datasets on race results, races, drivers, and constructors. It discusses the structure of the datasets and the need to assign column names while importing.', 'The use of libraries such as NumPy, Pandas, Matplotlib, and Seaborn for F1 sports analysis. The use of these libraries is mentioned for the analysis of specific questions related to the success of drivers and constructors, impact of starting and finishing positions, and changes in car speed over the years.', 'Demonstration of loading and assigning column names to the datasets using the Pandas library and notepad. The process of loading each dataset and assigning column names using the Pandas library and a notepad is demonstrated step by step.']}, {'end': 13007.048, 'start': 11950.523, 'title': 'Merging and processing motorsport data', 'summary': 'Explains merging multiple datasets using pd.merge function, renaming and rearranging columns, dropping unnecessary data columns, replacing values, changing data types, and checking dataset shape, information, and the first 10 rows.', 'duration': 1056.525, 'highlights': ['The chapter explains merging multiple datasets using pd.merge function, renaming and rearranging columns, dropping unnecessary data columns, replacing values, changing data types, and checking dataset shape, information, and the first 10 rows. Merging datasets using pd.merge function, renaming and rearranging columns, dropping unnecessary data columns, replacing values, changing data types, checking dataset shape, information, and the first 10 rows.', 'The total number of rows in the dataset is 24197 and there are a total of 15 columns. Total number of rows in the dataset, Total number of columns.', 'The columns include year, Grand Prix name, round number, driver, constructor name, grid, position order, total number of points, time, time in milliseconds, fastest lap rank, fastest lap speed, nationality of the driver, and nationality of the constructor. Columns in the dataset.']}, {'end': 13378.651, 'start': 13007.749, 'title': 'F1 grand prix analysis', 'summary': 'Analyzes f1 grand prix data to visualize the distribution of wins, identifying the top 10 drivers who have won the most races, with michael schumacher leading with 91 gp wins and hamilton having overtaken him with close to 103 wins as of now. the analysis also reveals that only a few drivers in the history of f1 have won over 20 grand prix, and some have won over 50 races, leading to a focus on the top 5 or top 10 drivers in the next section.', 'duration': 370.902, 'highlights': ['Michael Schumacher is the position order leader with 91 GP wins. Michael Schumacher leads with 91 Grand Prix wins.', 'Hamilton has overtaken Michael Schumacher and has won over 103 races as of now. Hamilton has overtaken Michael Schumacher with close to 103 wins as of now.', 'Only a few drivers in the history of F1 have won over 20 Grand Prix, and some have won over 50 races. Only a few drivers in the history of F1 have won over 20 Grand Prix, and some have won over 50 races.']}, {'end': 13700.024, 'start': 13378.931, 'title': 'Visualizing f1 data with bar plots', 'summary': 'Demonstrates creating horizontal bar plots for the top 10 f1 drivers and constructors, showcasing the number of wins for each and using colors to distinguish the data.', 'duration': 321.093, 'highlights': ['The chapter demonstrates creating a horizontal bar plot to display the top F1 drivers, with Michael Schumacher leading with close to 91 wins, followed by Hamilton, Vettel, Alonso, and Clark. Michael Schumacher leading with close to 91 wins', 'It also showcases a horizontal bar plot for the top F1 constructors, with Ferrari, McLaren, and Williams leading in the number of wins. Ferrari, McLaren, and Williams leading in the number of wins', 'The demonstration includes specifying colors for the bars, setting alpha values, line width, and edge color to customize the plots for visual clarity. Specifying colors for the bars, setting alpha values, line width, and edge color']}, {'end': 14027.328, 'start': 13700.024, 'title': 'Formula one data analysis', 'summary': 'Explores formula one data, including top gp wins, regression plot of starting and finishing positions, changes in average fastest lap speeds over the years, and differences in speeds among grand prix circuits.', 'duration': 327.304, 'highlights': ['Ferrari is the team with the most Grand Prix wins, followed by McLaren, Williams, and Mercedes. Ferrari leads with the most Grand Prix wins, followed by McLaren, Williams, and Mercedes.', "Mercedes likely to gain more wins by the end of 2021 season due to Hamilton's performance. Mercedes expected to gain more wins by the end of 2021 season, attributed to Hamilton's performance.", 'Linear relationship observed between starting and finishing positions in the regression plot. Regression plot reveals a clear linear relationship between starting and finishing positions.', 'Average fastest lap speeds decreased till around 2015 and then started to increase from 2015 onwards. Average fastest lap speeds decreased until around 2015, then began to increase from 2015 onwards.', 'Italian Grand Prix has the fastest race with average speeds exceeding 240 km/h, while Monaco Grand Prix is the slowest with speeds below 160 km/h. Italian Grand Prix has the fastest race with speeds exceeding 240 km/h, while Monaco Grand Prix is the slowest with speeds below 160 km/h.']}], 'duration': 2670.58, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mLOwujrfRPs/pics/mLOwujrfRPs11356748.jpg', 'highlights': ['The chapter covers the process of importing and assigning column names to datasets, including datasets on race results, races, drivers, and constructors.', 'Demonstration involves the use of Python libraries for performing exploratory data analysis and drawing valuable insights from the four publicly available datasets about Formula 1.', 'The chapter explains merging multiple datasets using pd.merge function, renaming and rearranging columns, dropping unnecessary data columns, replacing values, changing data types, checking dataset shape, information, and the first 10 rows.', 'Hamilton has overtaken Michael Schumacher and has won over 103 races as of now.', 'The chapter demonstrates creating a horizontal bar plot to display the top F1 drivers, with Michael Schumacher leading with close to 91 wins, followed by Hamilton, Vettel, Alonso, and Clark.', 'Ferrari is the team with the most Grand Prix wins, followed by McLaren, Williams, and Mercedes.', 'Linear relationship observed between starting and finishing positions in the regression plot.', 'Average fastest lap speeds decreased till around 2015 and then started to increase from 2015 onwards.']}], 'highlights': ['The chapter covers two hands-on projects on COVID data analysis using Python and Tableau, including using real-world data, Python libraries like NumPy, Pandas, Matplotlib, and Seaborn, and statistics on COVID-19 cases and deaths, including over 22 crore global infections and 4.5 million deaths, with India reporting over 3.3 crore confirmed cases and nearly 4,41,000 deaths.', "Creating reports and visualizations using Tableau Public edition for 192 countries' COVID-19 data.", 'The athlete_events dataset contains 271116 rows and 15 columns, providing substantial information for analysis.', 'The process of importing and merging two datasets using Pandas in a Jupyter notebook for Olympics data analysis.', 'Introduction to exploratory data analysis using the IPL auction dataset for IPL 2022 and its significance in world cricket.', "The process of removing the unnecessary first column, 'unnamed: 0', using the dot drop function and confirming its successful removal from the data frame.", '633 players participated in the 2022 IPL auction, providing comprehensive data for further insights.', 'The chapter covers the process of importing and assigning column names to datasets, including datasets on race results, races, drivers, and constructors.', 'Hamilton has overtaken Michael Schumacher and has won over 103 races as of now.']}