title
Data Analysis with Python Course - Numpy, Pandas, Data Visualization
description
Learn the basics of Python, Numpy, Pandas, Data Visualization, and Exploratory Data Analysis in this course for beginners. This was originally presented as a live course.
By the end of the course, you will be able to build an end-to-end real-world course project and earn a verified certificate of accomplishment. There are no prerequisites for this course.
Learn more and register for a certificate of accomplishment here: http://zerotopandas.com
This full course video includes 6 lectures (all in this video):
• Introduction to Programming with Python
• Next Steps with Python
• Numerical Computing with Numpy
• Analyzing Tabular Data with Pandas
• Visualization with Matplotlib and Seaborn
• Exploratory Data Analysis - A Case Study
💻 Code References
• First steps with Python: https://jovian.ai/aakashns/first-steps-with-python
• Variables and data types: https://jovian.ai/aakashns/python-variables-and-data-types
• Conditional statements and loops: https://jovian.ai/aakashns/python-branching-and-loops
• Functions and scope: https://jovian.ai/aakashns/python-functions-and-scope
• Working with OS & files: https://jovian.ai/aakashns/python-os-and-filesystem
• Numerical computing with Numpy: https://jovian.ai/aakashns/python-numerical-computing-with-numpy
• 100 Numpy exercises: https://jovian.ai/aakashns/100-numpy-exercises
• Analyzing tabular data with Pandas: https://jovian.ai/aakashns/python-pandas-data-analysis
• Matplotlib & Seaborn tutorial: https://jovian.ai/aakashns/python-matplotlib-data-visualization
• Data visualization cheat sheet: https://jovian.ai/aakashns/dataviz-cheatsheet
• EDA on StackOverflow Developer Survey: https://jovian.ai/aakashns/python-eda-stackoverflow-survey
• Opendatasets python package: https://github.com/JovianML/opendatasets
• EDA starter notebook: https://jovian.ai/aakashns/zerotopandas-course-project-starter
⭐️ Course Contents ⭐️
0:00:00 Course Introduction
Lecture 1
0:01:42 Python Programming Fundamentals
0:02:40 Course Curriculum
0:05:24 Notebook - First Steps with Python and Jupyter
0:08:30 Performing Arithmetic Operations with Python
0:11:34 Solving Multi-step problems using variables
0:20:17 Combining conditions with Logical operators
0:22:22 Adding text using Markdown
0:23:50 Saving and Uploading to Jovian
0:26:38 Variables and Datatypes in Python
0:31:28 Built-in Data types in Python
1:07:19 Further Reading
Lecture 2
1:08:46 Branching Loops and Functions
1:09:02 Notebook - Branching using conditional statements and loops in Python
1:09:24 Branching with if, else, elif
1:15:25 Non Boolean conditions
1:19:00 Iteration with while loops
1:28:57 Iteration with for loops
1:36:27 Functions and scope in Python
1:36:53 Creating and using functions
1:42:24 Writing great functions in Python
1:45:38 Local variables and scope
2:08:19 Documentation functions using Docstrings
2:11:40 Exercise - Data Analysis for Vacation Planning
Lecture 3
2:17:17 Numercial Computing with Numpy
2:18:00 Notebook - Numerical Computing with Numpy
2:26:09 From Python Lists to Numpy Arrays
2:29:09 Operating on Numpy Arrays
2:34:33 Multidimensional Numpy Arrays
3:03:41 Array Indexing and Slicing
3:17:49 Exercises and Further Reading
3:20:50 Assignment 2 - Numpy Array Operations
3:29:16 100 Numpy Exercises
3:31:25 Reading from and Writing to Files using Python
Lecture 4
4:02:59 Analysing Tabular Data with Pandas
4:03:58 Notebook - Analyzing Tabular Data with Pandas
4:16:33 Retrieving Data from a Data Frame
4:32:00 Analyzing Data from Data Frames
4:36:27 Querying and Sorting Rows
5:01:45 Grouping and Aggregation
5:11:26 Merging Data from Multiple Sources
5:26:00 Basic Plotting with Pandas
5:38:27 Assignment 3 - Pandas Practice
Lecture 5
5:52:48 Visualization with Matplotlib and Seaborn
5:54:04 Notebook - Data Visualization with Matplotlib and Seaborn
6:06:43 Line Charts
6:11:27 Improving Default Styles with Seaborn
6:16:51 Scatter Plots
6:28:14 Histogram
6:38:47 Bar Chart
6:50:00 Heatmap
6:57:08 Displaying Images with Matplotlib
7:03:37 Plotting multiple charts in a grid
7:15:42 References and further reading
7:20:17 Course Project - Exploratory Data Analysis
Lecture 6
7:49:56 Exploratory Data Analysis - A Case Study
7:50:55 Notebook - Exploratory Data Analysis - A case Study
8:04:36 Data Preparation and Cleaning
8:19:37 Exploratory Analysis and Visualization
8:54:02 Asking and Answering Questions
9:22:57 Inferences and Conclusions
9:25:00 References and Future Work
9:29:41 Setting up and running Locally
9:34:21 Project Guidelines
9:45:00 Course Recap
9:48:01 What to do next?
9:49:10 Certificate of Accomplishment
9:50:11 What to do after this course?
9:52:16 Jovian Platform
✏️ This course is taught by Aakash N S, co-founder, and CEO of Jovian.
Jovian's YouTube channel: https://youtube.com/jovianml
detail
{'title': 'Data Analysis with Python Course - Numpy, Pandas, Data Visualization', 'heatmap': [{'end': 7162.157, 'start': 6802.458, 'weight': 1}, {'end': 10025.485, 'start': 9661.263, 'weight': 0.793}, {'end': 17892.364, 'start': 17533.371, 'weight': 0.741}, {'end': 22192.814, 'start': 21823.385, 'weight': 0.792}], 'summary': 'This python data analysis course covers numpy, pandas, matplotlib, practical coding, and real-world datasets, offering a certificate of accomplishment. it includes solving a profit calculation problem resulting in a $125 profit, and delves into various python programming fundamentals, loan emi calculation, module usage, climate data manipulation, csv data handling, and data visualization techniques.', 'chapters': [{'end': 1249.357, 'segs': [{'end': 64.989, 'src': 'embed', 'start': 0.748, 'weight': 0, 'content': [{'end': 8.273, 'text': 'Data analysis with Python zero to pandas is a practical beginner friendly and coding focused introduction to data analysis.', 'start': 0.748, 'duration': 7.525}, {'end': 15.158, 'text': 'This is a live online course, and you can earn a verified certificate of accomplishment by completing this course.', 'start': 9.134, 'duration': 6.024}, {'end': 22.664, 'text': "If you're interested in learning data science with Python, but don't know where to start, then this course is designed just for you.", 'start': 15.979, 'duration': 6.685}, {'end': 26.166, 'text': 'You can learn more and register at zero to pandas.com.', 'start': 23.404, 'duration': 2.762}, {'end': 34.635, 'text': 'By the end of this course, you will be able to confidently use the Python programming language and its amazing ecosystem of data science,', 'start': 27.367, 'duration': 7.268}, {'end': 42.644, 'text': 'libraries like NumPy for mathematical and statistical computing, Pandas for data processing and analysis,', 'start': 34.635, 'duration': 8.009}, {'end': 46.168, 'text': 'Matplotlib for creating beautiful visualizations and much more.', 'start': 42.644, 'duration': 3.524}, {'end': 51.231, 'text': 'You will get a chance to practice and improve your skills with weekly assignments.', 'start': 47.062, 'duration': 4.169}, {'end': 58.245, 'text': 'And you will also work on an end to end course project where you will perform data analysis on a large real world dataset.', 'start': 51.431, 'duration': 6.814}, {'end': 64.989, 'text': "This is a beginner friendly course, so you don't need to have any prior knowledge of Python or data science.", 'start': 59.144, 'duration': 5.845}], 'summary': 'Zero to pandas course offers practical beginner-friendly data analysis with python, including numpy, pandas, and matplotlib, with a real-world data analysis project and weekly assignments.', 'duration': 64.241, 'max_score': 0.748, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI748.jpg'}, {'end': 289.762, 'src': 'embed', 'start': 259.723, 'weight': 2, 'content': [{'end': 265.627, 'text': 'where you will get to work on an end-to-end project involving the real world dataset that is picked by you.', 'start': 259.723, 'duration': 5.904}, {'end': 270.391, 'text': "You will get a chance to apply all the tools and techniques that you've learned in this course.", 'start': 266.248, 'duration': 4.143}, {'end': 273.473, 'text': 'The goal of this course project, and this entire course,', 'start': 270.891, 'duration': 2.582}, {'end': 279.338, 'text': 'is to create something that you can proudly showcase as your own original work on your professional profile.', 'start': 273.473, 'duration': 5.865}, {'end': 281.999, 'text': 'You have to submit all the weekly assignments.', 'start': 279.838, 'duration': 2.161}, {'end': 289.762, 'text': 'You have to complete the course project and you have to follow the academic honesty policy, which essentially is no plagiarism.', 'start': 282.439, 'duration': 7.323}], 'summary': 'Course project involves real world dataset, applying learned tools, and showcasing original work on professional profile.', 'duration': 30.039, 'max_score': 259.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI259723.jpg'}, {'end': 752.03, 'src': 'embed', 'start': 721.495, 'weight': 4, 'content': [{'end': 724.077, 'text': 'So the cost of ice bag is 1.25.', 'start': 721.495, 'duration': 2.582}, {'end': 725.878, 'text': 'The profit margin is 20%.', 'start': 724.077, 'duration': 1.801}, {'end': 727.699, 'text': 'That is 20 by a hundred point two.', 'start': 725.878, 'duration': 1.821}, {'end': 731.782, 'text': 'So the profit per bag is the profit margin times the cost of the ice bag.', 'start': 728.1, 'duration': 3.682}, {'end': 736.365, 'text': 'So then we have this expression and then the number of bags is 500.', 'start': 732.182, 'duration': 4.183}, {'end': 739.929, 'text': 'And then the total profit then is number of bags times the profit per bag.', 'start': 736.365, 'duration': 3.564}, {'end': 742.212, 'text': 'So finally we end up with this expression.', 'start': 740.37, 'duration': 1.842}, {'end': 746.797, 'text': 'This is the point where you would pick up a calculator and here we are going to use Python to do it.', 'start': 742.432, 'duration': 4.365}, {'end': 752.03, 'text': 'So the grocery store makes a profit of $125 total.', 'start': 747.907, 'duration': 4.123}], 'summary': 'The grocery store makes a total profit of $125.', 'duration': 30.535, 'max_score': 721.495, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI721495.jpg'}], 'start': 0.748, 'title': 'Intro to python data analysis', 'summary': 'Introduces a beginner-friendly course on data analysis with python, offering a verified certificate of accomplishment, covering numpy, pandas, matplotlib, practical coding, and real-world datasets. it outlines the course structure, including weekly assignments and a final project involving a real-world dataset. additionally, it demonstrates solving a profit calculation problem using python, resulting in a total profit of $125.', 'chapters': [{'end': 240.526, 'start': 0.748, 'title': 'Zero to pandas: python data analysis', 'summary': 'Introduces a beginner-friendly live online course on data analysis with python, offering a verified certificate of accomplishment upon completion, covering numpy, pandas, matplotlib, and more, with a focus on practical coding and real-world datasets, created by a data science platform with a global community and instruction by an experienced professional with 12 years of python experience.', 'duration': 239.778, 'highlights': ['The course offers a practical beginner-friendly introduction to data analysis with Python and provides a verified certificate of accomplishment upon completion.', 'The course covers the usage of Python programming language, NumPy for mathematical and statistical computing, Pandas for data processing and analysis, and Matplotlib for creating visualizations.', 'Students will have the opportunity to practice and improve their skills with weekly assignments and an end-to-end course project involving data analysis on a large real-world dataset.', 'The course is designed for beginners with no prior knowledge of Python or data science, and basic programming knowledge is helpful but not mandatory.', 'The instructor, Akash, is an experienced professional with 12 years of Python experience, a background in computer science from IIT Bombay, and previous work at Twitter as a software engineer in data engineering and software development.']}, {'end': 695.423, 'start': 240.607, 'title': 'Zero to pandas course overview', 'summary': 'Outlines the structure of the zero to pandas course, including weekly assignments, community support, and a final course project involving a real-world dataset chosen by the participant, emphasizing the importance of completing all assignments and following the academic honesty policy.', 'duration': 454.816, 'highlights': ['The course project involves working on an end-to-end project with a real-world dataset chosen by the participant, applying all the tools and techniques learned in the course.', 'Weekly assignments provide opportunities to practice programming skills and learn through practical application.', 'Emphasis on completing all weekly assignments and following the academic honesty policy, which prohibits plagiarism.', 'Access to community support for help and guidance throughout the course.', 'Introduction to Jupyter notebooks and execution of code for Python programming, emphasizing practical learning through experimentation.']}, {'end': 1249.357, 'start': 696.363, 'title': 'Solving a profit calculation problem with python', 'summary': 'Demonstrates solving a profit calculation problem by selling 500 bags of ice at a 20% profit margin, resulting in a total profit of $125 using python, introducing the concept of variables, arithmetic operations, and evaluation of conditions.', 'duration': 552.994, 'highlights': ['Introducing the concept of variables and arithmetic operations in Python.', 'Using Python to calculate the total profit of $125 by selling 500 bags of ice at a 20% profit margin.', 'Explanation of the equality, not equal to, greater than, less than, greater than equal to, and less than equal to operators in Python.']}], 'duration': 1248.609, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI748.jpg', 'highlights': ['The course offers a practical beginner-friendly introduction to data analysis with Python and provides a verified certificate of accomplishment upon completion.', 'The course covers the usage of Python programming language, NumPy for mathematical and statistical computing, Pandas for data processing and analysis, and Matplotlib for creating visualizations.', 'The course project involves working on an end-to-end project with a real-world dataset chosen by the participant, applying all the tools and techniques learned in the course.', 'Students will have the opportunity to practice and improve their skills with weekly assignments and an end-to-end course project involving data analysis on a large real-world dataset.', 'Using Python to calculate the total profit of $125 by selling 500 bags of ice at a 20% profit margin.']}, {'end': 3163.497, 'segs': [{'end': 1451.212, 'src': 'embed', 'start': 1417.69, 'weight': 0, 'content': [{'end': 1419.011, 'text': 'So Markdown is very simple.', 'start': 1417.69, 'duration': 1.321}, {'end': 1420.513, 'text': 'It takes 20 minutes to learn it.', 'start': 1419.071, 'duration': 1.442}, {'end': 1423.235, 'text': "And it's a really nice way to write pretty text.", 'start': 1420.633, 'duration': 2.602}, {'end': 1426.077, 'text': 'So you can learn the full syntax of Markdown here.', 'start': 1423.635, 'duration': 2.442}, {'end': 1427.218, 'text': "I've linked to a tutorial.", 'start': 1426.117, 'duration': 1.101}, {'end': 1431.581, 'text': 'So that brings us pretty much to the end of our first Jupyter notebook.', 'start': 1427.778, 'duration': 3.803}, {'end': 1436.185, 'text': "So we've learned some Python here, basic arithmetic logic conditions.", 'start': 1431.762, 'duration': 4.423}, {'end': 1438.747, 'text': "We've learned a little bit about the Jupyter interface.", 'start': 1436.385, 'duration': 2.362}, {'end': 1440.448, 'text': 'Now, as I said, this.', 'start': 1439.227, 'duration': 1.221}, {'end': 1444.11, 'text': 'Notebook is running online on a free service.', 'start': 1441.329, 'duration': 2.781}, {'end': 1451.212, 'text': "Is this a website that lets you run Jupyter notebooks online for free, but since it's a free service, it's going to shut down after some time.", 'start': 1444.29, 'duration': 6.922}], 'summary': 'Markdown is simple, takes 20 mins to learn, used for pretty text. learned python basics, jupyter interface, running on free service.', 'duration': 33.522, 'max_score': 1417.69, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI1417690.jpg'}, {'end': 1611.562, 'src': 'embed', 'start': 1581.273, 'weight': 1, 'content': [{'end': 1583.094, 'text': 'And you can see that the notebook is here.', 'start': 1581.273, 'duration': 1.821}, {'end': 1584.375, 'text': 'You have all of these options.', 'start': 1583.314, 'duration': 1.061}, {'end': 1587.638, 'text': 'If you want to continue running, you can click run on binder and continue running.', 'start': 1584.495, 'duration': 3.143}, {'end': 1589.259, 'text': 'So this is how we use Jovian.', 'start': 1588.058, 'duration': 1.201}, {'end': 1593.705, 'text': 'And you will be using Jovian to actually work on the assignments and things like that.', 'start': 1590.382, 'duration': 3.323}, {'end': 1595.827, 'text': "So that's a quick intro for you.", 'start': 1593.785, 'duration': 2.042}, {'end': 1599.731, 'text': "Okay So we've covered our first steps with Python.", 'start': 1597.008, 'duration': 2.723}, {'end': 1602.173, 'text': "Now we're ready to move on to variables and data types.", 'start': 1599.791, 'duration': 2.382}, {'end': 1604.235, 'text': 'So just click the variables and data types link.', 'start': 1602.313, 'duration': 1.922}, {'end': 1607.258, 'text': 'Now this is yet another Jupyter notebook.', 'start': 1605.356, 'duration': 1.902}, {'end': 1611.562, 'text': 'Once again, click the run button, go to run and click run on binder.', 'start': 1607.478, 'duration': 4.084}], 'summary': 'Introducing jovian for running python assignments and notebooks, then moving on to variables and data types.', 'duration': 30.289, 'max_score': 1581.273, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI1581273.jpg'}, {'end': 1673.988, 'src': 'embed', 'start': 1645.405, 'weight': 2, 'content': [{'end': 1650.429, 'text': 'Okay So variables and we know already what variables are.', 'start': 1645.405, 'duration': 5.024}, {'end': 1656.274, 'text': 'They are containers for storing information for storing data and the data stored in a variable is called a value.', 'start': 1650.71, 'duration': 5.564}, {'end': 1659.837, 'text': 'And this is how you create an access, the value of a variable.', 'start': 1656.855, 'duration': 2.982}, {'end': 1668.184, 'text': "So you set my favorite color and then you access my favorite color and a variable is created using this assignment statement that we've already seen.", 'start': 1660.318, 'duration': 7.866}, {'end': 1673.988, 'text': 'I know that this assignment statement is actually different from the equality comparison operator,', 'start': 1669.165, 'duration': 4.823}], 'summary': 'Variables are containers for storing data, accessed using assignment statements.', 'duration': 28.583, 'max_score': 1645.405, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI1645405.jpg'}, {'end': 1831.125, 'src': 'embed', 'start': 1801.253, 'weight': 3, 'content': [{'end': 1804.534, 'text': 'But there are a few rules you have to follow while naming Python variables.', 'start': 1801.253, 'duration': 3.281}, {'end': 1808.955, 'text': 'A variables name has to start with a letter or the underscore character.', 'start': 1805.034, 'duration': 3.921}, {'end': 1810.276, 'text': 'It cannot start with a number.', 'start': 1809.075, 'duration': 1.201}, {'end': 1814.297, 'text': 'A variable name can only contain lowercase and uppercase letters.', 'start': 1811.215, 'duration': 3.082}, {'end': 1821.82, 'text': 'So A to Z lowercase or uppercase, it can contain digits except at the first position and it can contain underscore characters.', 'start': 1814.777, 'duration': 7.043}, {'end': 1824.662, 'text': 'And then finally variable names are case sensitive.', 'start': 1822.341, 'duration': 2.321}, {'end': 1831.125, 'text': 'So all variable with all lowercase is different from all variable with A and V in uppercase.', 'start': 1825.182, 'duration': 5.943}], 'summary': 'Python variable names must follow specific rules: start with letter or underscore, no starting with number, case sensitive, can contain digits and underscores.', 'duration': 29.872, 'max_score': 1801.253, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI1801253.jpg'}, {'end': 1952.71, 'src': 'embed', 'start': 1928.771, 'weight': 4, 'content': [{'end': 1936.576, 'text': 'So it has the type list and Python has several built-in data types for storing different types of information into variables.', 'start': 1928.771, 'duration': 7.805}, {'end': 1940.299, 'text': 'What we will look at are some of the most commonly used data types.', 'start': 1937.157, 'duration': 3.142}, {'end': 1946.483, 'text': 'Now there are a few primitive data types, integers, floats, Booleans, none, and string.', 'start': 1940.699, 'duration': 5.784}, {'end': 1952.71, 'text': 'And then you have these other types like list, tuple dictionary, and then you have sets and classes and so on.', 'start': 1947.043, 'duration': 5.667}], 'summary': 'Python has multiple data types, including integers, floats, booleans, none, strings, lists, tuples, dictionaries, sets, and classes.', 'duration': 23.939, 'max_score': 1928.771, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI1928771.jpg'}, {'end': 2399.773, 'src': 'embed', 'start': 2354.336, 'weight': 5, 'content': [{'end': 2356.678, 'text': 'So first true gets converted to the integer one.', 'start': 2354.336, 'duration': 2.342}, {'end': 2365.125, 'text': "Then since you're adding an integer and a float that gets converted into a float and you end up with a float 4.0.", 'start': 2357.799, 'duration': 7.326}, {'end': 2366.526, 'text': "So that's something to keep in mind.", 'start': 2365.125, 'duration': 1.401}, {'end': 2375.41, 'text': 'Another very useful thing is that any value in Python can be converted to a Boolean using the bool function.', 'start': 2367.448, 'duration': 7.962}, {'end': 2377.977, 'text': 'So you just use the bull function.', 'start': 2376.536, 'duration': 1.441}, {'end': 2382.7, 'text': 'put in anything that you write in Python, any value or any expression.', 'start': 2377.977, 'duration': 4.723}, {'end': 2386.543, 'text': 'technically speaking, that expression will get converted into a Boolean.', 'start': 2382.7, 'duration': 3.843}, {'end': 2391.927, 'text': 'Now, when it gets converted into a Boolean, how do we, how does Python, decide whether it should be a true or a false?', 'start': 2386.803, 'duration': 5.124}, {'end': 2395.41, 'text': 'There is only a limited number of things that become false.', 'start': 2392.728, 'duration': 2.682}, {'end': 2397.691, 'text': 'So the value falls itself is false.', 'start': 2395.93, 'duration': 1.761}, {'end': 2399.773, 'text': 'And then all the empty states.', 'start': 2398.172, 'duration': 1.601}], 'summary': 'In python, true becomes 1, adding int and float results in float, and any value can be converted to boolean using bool function.', 'duration': 45.437, 'max_score': 2354.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI2354336.jpg'}, {'end': 2656.566, 'src': 'embed', 'start': 2626.893, 'weight': 7, 'content': [{'end': 2629.195, 'text': 'Three quotations is a special marker in Python.', 'start': 2626.893, 'duration': 2.302}, {'end': 2633.117, 'text': "That's our multi-line string, and you can experiment with these parts.", 'start': 2629.975, 'duration': 3.142}, {'end': 2634.418, 'text': "I'm going to leave it up to you.", 'start': 2633.337, 'duration': 1.081}, {'end': 2641.422, 'text': "But what's interesting is now already with string, even though it's a primary data type, it actually does contain a bunch of characters.", 'start': 2635.58, 'duration': 5.842}, {'end': 2642.642, 'text': "So it's a container as well.", 'start': 2641.442, 'duration': 1.2}, {'end': 2646.063, 'text': 'You can actually check the length of a string using the length function.', 'start': 2643.162, 'duration': 2.901}, {'end': 2648.123, 'text': 'You can check how many characters it contains.', 'start': 2646.083, 'duration': 2.04}, {'end': 2654.105, 'text': "For instance, length of my favorite movie, one flew over the cuckoo's nest is 31.", 'start': 2648.363, 'duration': 5.742}, {'end': 2656.566, 'text': 'And then there is, there are some special characters.', 'start': 2654.105, 'duration': 2.461}], 'summary': 'Python string contains 31 characters; length can be checked using len function.', 'duration': 29.673, 'max_score': 2626.893, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI2626893.jpg'}, {'end': 2969.67, 'src': 'embed', 'start': 2943.311, 'weight': 9, 'content': [{'end': 2947.474, 'text': 'These are pretty simple methods, but there are also some really powerful methods that we use all the time.', 'start': 2943.311, 'duration': 4.163}, {'end': 2950.316, 'text': 'So one is the replace method in the replace method.', 'start': 2947.874, 'duration': 2.442}, {'end': 2957.561, 'text': "What we're going to say is we're going to take Saturday, the string, and then we're going to call replace and then give it a couple of inputs.", 'start': 2950.356, 'duration': 7.205}, {'end': 2960.023, 'text': 'So we want to give it the input.', 'start': 2958.081, 'duration': 1.942}, {'end': 2961.524, 'text': 'What do we want to replace??', 'start': 2960.243, 'duration': 1.281}, {'end': 2965.527, 'text': "So let's say we want to replace sector and we, what do we want to replace it with??", 'start': 2961.584, 'duration': 3.943}, {'end': 2969.67, 'text': "So let's say we want to replace it with redness, and that makes it.", 'start': 2965.807, 'duration': 3.863}], 'summary': "The transcript discusses the usage of the replace method to replace 'sector' with 'redness'.", 'duration': 26.359, 'max_score': 2943.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI2943311.jpg'}, {'end': 3049.449, 'src': 'embed', 'start': 3026.314, 'weight': 8, 'content': [{'end': 3035.38, 'text': 'Then one very useful method is the format method and the format method is used to combine the values of other data types with strings.', 'start': 3026.314, 'duration': 9.066}, {'end': 3037.181, 'text': "Let's look at an example here.", 'start': 3035.9, 'duration': 1.281}, {'end': 3039.162, 'text': 'So here we have a few variables.', 'start': 3037.701, 'duration': 1.461}, {'end': 3045.046, 'text': 'We have the cost of an ice bag, and then we have a profit margin, and then we have a number of bags as such.', 'start': 3039.242, 'duration': 5.804}, {'end': 3049.449, 'text': 'There are just three variables that we have initialized and they have different data types.', 'start': 3045.086, 'duration': 4.363}], 'summary': 'The format method combines values of different data types with strings.', 'duration': 23.135, 'max_score': 3026.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3026314.jpg'}], 'start': 1249.797, 'title': 'Python basics and string manipulation', 'summary': 'Covers jupyter notebook basics, variables and data types in python, python data types, python strings basics, and manipulating strings in python with tips for beginners, usage of markdown formatting, creation of variables, data type conversion, and string manipulation techniques.', 'chapters': [{'end': 1644.191, 'start': 1249.797, 'title': 'Jupyter notebook basics', 'summary': 'Covers the basics of jupyter notebook, including tips for beginners, markdown formatting, and using jovian to save and share notebooks, with a mention of free jupyter notebook hosting service and api key retrieval.', 'duration': 394.394, 'highlights': ['The chapter covers the basics of Jupyter Notebook, including tips for beginners, Markdown formatting, and using Jovian to save and share notebooks.', 'Mentions a free Jupyter notebook hosting service that shuts down after some time if the tab is left inactive for 10 minutes.', 'Explains how to retrieve an API key from Jovian.ml for saving and sharing notebooks.', 'Covers the process of using Jovian to save and share notebooks, including entering the API key and obtaining a link for sharing.', "Provides a brief introduction to the next topic, 'variables and data types,' and instructs to click the link to open the next Jupyter notebook."]}, {'end': 2173.677, 'start': 1645.405, 'title': 'Variables and data types in python', 'summary': 'Covers the basics of variables, including creation, assignment, reassignment, and naming conventions. it also discusses different data types in python, such as integers, floats, and their characteristics, as well as their conversions.', 'duration': 528.272, 'highlights': ['Variables are containers for storing data and can be defined using assignment statements, allowing for reassignment of values.', 'The chapter emphasizes the importance of understanding the difference between the assignment statement and the equality comparison operator in programming.', 'Python allows for various naming conventions for variables but imposes specific rules, such as starting with a letter or underscore, and being case sensitive.', 'The chapter explains different data types in Python, including integers, floats, and their characteristics, such as the arbitrary size of integers and limitations of floats.', 'It is highlighted that whole numbers written with a decimal point are treated as floats, and the conversion between floats and integers is possible using specific functions in Python.']}, {'end': 2485.129, 'start': 2174.679, 'title': 'Python data types', 'summary': 'Covers the conversion of data types in python, including integers, floats, booleans, and the none type, demonstrating the automatic conversion of booleans to integers, the use of the bool function to convert values to booleans, and the none type to indicate the absence of a value.', 'duration': 310.45, 'highlights': ['Booleans are automatically converted to integers in Python, where true becomes 1 and false becomes 0, and can be used in arithmetic operations.', 'The bool function in Python can convert any value or expression to a boolean, with certain values automatically converting to false, including 0, 0.0, and empty values of different data types.', 'The none type in Python is used to indicate the absence of a value, allowing for the declaration of variables with missing values without using zero or false as a fill-in.']}, {'end': 2781.95, 'start': 2485.789, 'title': 'Python strings basics', 'summary': 'Covers the basics of python strings, including creating strings with single or double quotes, handling quotes within strings, special characters, string length, converting strings to lists, and accessing individual characters and parts of a string.', 'duration': 296.161, 'highlights': ['Python strings basics: creating, handling quotes, special characters, length, converting to list, and accessing characters and parts.', 'Handling quotes within strings: using escape character or using different quotes for strings with quotes.', 'Special characters and their impact on string length: understanding characters like slash N and their effect on string length.', 'Converting strings to lists and accessing individual characters: using list function to convert a string into a list and accessing characters by index.']}, {'end': 3163.497, 'start': 2782.37, 'title': 'Manipulating strings in python', 'summary': 'Covers various string manipulation techniques in python, including checking string containment, concatenating strings, accessing string methods like lower, upper, and capitalize, using powerful methods like replace and format, and the importance of using format method for constructing messages with variables.', 'duration': 381.127, 'highlights': ['The format method is used to combine the values of other data types with strings.', 'The replace method is used to replace specific substrings within a string with new substrings.', 'The lower, upper, and capitalize methods are used to manipulate the case of strings.', 'Demonstrates the usage of in operator to check for substring containment within a string.', 'Explains the potential issues of using the concatenation operator without proper spacing when joining strings.']}], 'duration': 1913.7, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI1249797.jpg', 'highlights': ['Covers jupyter notebook basics, including tips for beginners and markdown formatting', 'Explains the process of using Jovian to save and share notebooks', 'Variables are containers for storing data and can be defined using assignment statements', 'Python allows for various naming conventions for variables but imposes specific rules', 'Explains different data types in Python, including integers, floats, and their characteristics', 'Booleans are automatically converted to integers in Python, where true becomes 1 and false becomes 0', 'The bool function in Python can convert any value or expression to a boolean', 'Python strings basics: creating, handling quotes, special characters, length, converting to list, and accessing characters and parts', 'The format method is used to combine the values of other data types with strings', 'The replace method is used to replace specific substrings within a string with new substrings']}, {'end': 4009.3, 'segs': [{'end': 3224.247, 'src': 'embed', 'start': 3163.497, 'weight': 1, 'content': [{'end': 3170.701, 'text': "but you might get issues when you're trying to concatenate or you're trying to add strings with numbers or other data types.", 'start': 3163.497, 'duration': 7.204}, {'end': 3180.607, 'text': 'And the way to actually solve for that is to take each of these variables which you are doing concatenation with and convert them into a string using the STR function.', 'start': 3171.302, 'duration': 9.305}, {'end': 3185.567, 'text': 'So you see here, we get an error here, but here we can use the STR function.', 'start': 3181.764, 'duration': 3.803}, {'end': 3189.17, 'text': 'But the key idea here is that strings have methods.', 'start': 3186.308, 'duration': 2.862}, {'end': 3190.791, 'text': 'You can call these methods.', 'start': 3189.71, 'duration': 1.081}, {'end': 3194.134, 'text': 'You can give them some inputs and often these methods return new results.', 'start': 3190.831, 'duration': 3.303}, {'end': 3200.259, 'text': 'So as you experiment with this notebooks, try out these string methods and then try out some other string methods as well.', 'start': 3194.674, 'duration': 5.585}, {'end': 3207.599, 'text': 'Just like other data types, you have the STR function, which can take any data type and convert it into a string.', 'start': 3201.337, 'duration': 6.262}, {'end': 3210.7, 'text': 'Another minor thing is that you can actually compare strings.', 'start': 3208.159, 'duration': 2.541}, {'end': 3212.281, 'text': 'So here I have declared a string.', 'start': 3210.78, 'duration': 1.501}, {'end': 3215.662, 'text': 'So you see the single equal to here, this is an assignment operator.', 'start': 3212.661, 'duration': 3.001}, {'end': 3218.003, 'text': "So I've declared a string first name with the value John.", 'start': 3215.702, 'duration': 2.301}, {'end': 3224.247, 'text': 'And then I can compare the first name with, another string using the double equal to operator.', 'start': 3218.783, 'duration': 5.464}], 'summary': 'Use the str function to convert variables for concatenation, and explore string methods and comparisons for experimentation.', 'duration': 60.75, 'max_score': 3163.497, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3163497.jpg'}, {'end': 3265.874, 'src': 'embed', 'start': 3244.304, 'weight': 0, 'content': [{'end': 3254.008, 'text': "A list in Python is what you can probably tell it it's an ordered collection of values and lists can hold values of different data types within them.", 'start': 3244.304, 'duration': 9.704}, {'end': 3258.991, 'text': 'So you may be familiar with arrays or lists in other languages like C or C plus plus.', 'start': 3254.989, 'duration': 4.002}, {'end': 3260.611, 'text': "If you're not, that's fine.", 'start': 3259.571, 'duration': 1.04}, {'end': 3265.874, 'text': 'But if you are, then you might know that sometimes these lists can only contain a specific type of value.', 'start': 3261.052, 'duration': 4.822}], 'summary': 'A python list is an ordered collection of values that can hold different data types.', 'duration': 21.57, 'max_score': 3244.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3244304.jpg'}, {'end': 3587.397, 'src': 'embed', 'start': 3561.062, 'weight': 5, 'content': [{'end': 3566.084, 'text': 'And the way to do it is if you want to change the element at a particular index,', 'start': 3561.062, 'duration': 5.022}, {'end': 3571.366, 'text': 'you simply access that index and then use the assignment operator to put in the element.', 'start': 3566.084, 'duration': 5.282}, {'end': 3575.067, 'text': 'So what that does is now banana gets changed to blueberry.', 'start': 3571.926, 'duration': 3.141}, {'end': 3578.25, 'text': "So that's good that now we are modifying a list.", 'start': 3576.209, 'duration': 2.041}, {'end': 3585.395, 'text': 'Another way to modify list is to add something at the end and to add something at the end, we use the append method.', 'start': 3578.691, 'duration': 6.704}, {'end': 3587.397, 'text': 'So we use the append method here.', 'start': 3586.116, 'duration': 1.281}], 'summary': 'Modify list elements by accessing index and using assignment operator. use append method to add at the end.', 'duration': 26.335, 'max_score': 3561.062, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3561062.jpg'}, {'end': 3767.108, 'src': 'embed', 'start': 3737.981, 'weight': 6, 'content': [{'end': 3742.502, 'text': 'Tuples are immutable, which means they cannot be modified once they have been created.', 'start': 3737.981, 'duration': 4.521}, {'end': 3750.364, 'text': "Here I'm creating a tuple and the way to create a tuple is instead of a square bracket, you simply use a round bracket parenthesis.", 'start': 3743.142, 'duration': 7.222}, {'end': 3754.445, 'text': 'And they support many similar operations like you can check the length.', 'start': 3751.424, 'duration': 3.021}, {'end': 3755.545, 'text': 'The length is three.', 'start': 3754.825, 'duration': 0.72}, {'end': 3759.386, 'text': 'You can check an element with a positive index with a negative index.', 'start': 3755.705, 'duration': 3.681}, {'end': 3761.847, 'text': 'You can check if a fruit contains element.', 'start': 3759.546, 'duration': 2.301}, {'end': 3767.108, 'text': 'So the dates is a part of fruits, but what happens if you try to change an element?', 'start': 3762.047, 'duration': 5.061}], 'summary': 'Tuples are immutable and support various operations. length is 3.', 'duration': 29.127, 'max_score': 3737.981, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3737981.jpg'}, {'end': 3827.853, 'src': 'embed', 'start': 3802.009, 'weight': 2, 'content': [{'end': 3806.173, 'text': 'And just like list tuples have some methods as well, but a fairly limited number of methods.', 'start': 3802.009, 'duration': 4.164}, {'end': 3809.745, 'text': 'And finally, I want to talk about dictionaries.', 'start': 3807.324, 'duration': 2.421}, {'end': 3812.766, 'text': 'A dictionary is an unordered collection of values.', 'start': 3810.145, 'duration': 2.621}, {'end': 3816.808, 'text': 'So there is no specific order to the values stored in a dictionary.', 'start': 3812.806, 'duration': 4.002}, {'end': 3819.909, 'text': 'Now, if you do not have ordered, how do you access these values?', 'start': 3817.168, 'duration': 2.741}, {'end': 3821.71, 'text': 'So what dictionaries have?', 'start': 3820.47, 'duration': 1.24}, {'end': 3825.752, 'text': 'is that, apart from values that you put into the dictionary, they also have keys?', 'start': 3821.71, 'duration': 4.042}, {'end': 3827.853, 'text': 'In a sense, it is a lookup table.', 'start': 3826.292, 'duration': 1.561}], 'summary': 'Dictionaries store unordered values with keys for lookup.', 'duration': 25.844, 'max_score': 3802.009, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3802009.jpg'}], 'start': 3163.497, 'title': 'Python data types and manipulations', 'summary': 'Covers string methods, comparison operators, and python lists, tuples, and dictionaries. it includes usage of the str function, methods for strings, comparison operators, and examples of list indexing, modification methods, tuple immutability, and dictionary key-value pairs.', 'chapters': [{'end': 3243.022, 'start': 3163.497, 'title': 'String methods and comparison operators', 'summary': 'Discusses the usage of the str function to convert variables into strings for concatenation, the availability of methods for strings, and the comparison operators for strings, with examples of results and usage of the operators.', 'duration': 79.525, 'highlights': ['The STR function can be used to convert variables into strings for concatenation, resolving issues when adding strings with numbers or other data types.', 'Strings have methods that can be called with inputs, which often return new results, providing flexibility in string manipulation.', "Comparison operators like '==' and '!=' can be used to compare strings, producing boolean results of 'true' or 'false' based on the comparison."]}, {'end': 4009.3, 'start': 3244.304, 'title': 'Python lists, tuples, and dictionaries', 'summary': 'Covers the creation and manipulation of lists, tuples, and dictionaries in python, including examples of list indexing, list modification methods, tuple immutability, and dictionary key-value pairs with quantifiable examples.', 'duration': 764.996, 'highlights': ['Python lists can hold values of different data types and can be accessed using list indexing starting from zero.', "Lists can be modified by changing elements at specific indices, appending new elements, inserting elements at specific indices, and removing elements using methods like 'append', 'insert', 'remove', and 'pop'.", "Tuples are immutable data structures in Python, and they are created using round brackets instead of square brackets like lists, and they do not support modification methods like 'append' or 'remove'.", 'Dictionaries in Python are unordered collections of key-value pairs, providing a more organized way to store and access data, and they support methods for accessing keys, values, and key-value pairs.']}], 'duration': 845.803, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI3163497.jpg', 'highlights': ['Python lists can hold values of different data types and can be accessed using list indexing starting from zero.', 'Strings have methods that can be called with inputs, which often return new results, providing flexibility in string manipulation.', 'Dictionaries in Python are unordered collections of key-value pairs, providing a more organized way to store and access data, and they support methods for accessing keys, values, and key-value pairs.', 'The STR function can be used to convert variables into strings for concatenation, resolving issues when adding strings with numbers or other data types.', "Comparison operators like '==' and '!=' can be used to compare strings, producing boolean results of 'true' or 'false' based on the comparison.", "Lists can be modified by changing elements at specific indices, appending new elements, inserting elements at specific indices, and removing elements using methods like 'append', 'insert', 'remove', and 'pop'.", "Tuples are immutable data structures in Python, and they are created using round brackets instead of square brackets like lists, and they do not support modification methods like 'append' or 'remove'."]}, {'end': 6125.293, 'segs': [{'end': 4102.883, 'src': 'embed', 'start': 4074.362, 'weight': 4, 'content': [{'end': 4078.667, 'text': "So once I'm logged into Jovian dot ML, I copy the API key and paste it back here.", 'start': 4074.362, 'duration': 4.305}, {'end': 4088.579, 'text': 'And what that does is that takes the notebook that we were just looking at and it captures a snapshot of that notebook and then it puts it onto your Jovian ML profile.', 'start': 4079.813, 'duration': 8.766}, {'end': 4090.08, 'text': 'So now we can close this.', 'start': 4089.019, 'duration': 1.061}, {'end': 4094.283, 'text': 'And then this is then now going to go away in this binder instance.', 'start': 4090.56, 'duration': 3.723}, {'end': 4102.883, 'text': 'But my Jupiter notebook is here for me to view whenever I need to, and I can go back and I can run it using the run button.', 'start': 4095.538, 'duration': 7.345}], 'summary': 'After logging into jovian ml, the api key is used to capture and upload a notebook snapshot, accessible for future use.', 'duration': 28.521, 'max_score': 4074.362, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI4074362.jpg'}, {'end': 4770.889, 'src': 'embed', 'start': 4745.906, 'weight': 0, 'content': [{'end': 4752.788, 'text': "The next thing that we're going to look at is iteration and iteration is an extension of condition or an extension of branching.", 'start': 4745.906, 'duration': 6.882}, {'end': 4754.366, 'text': 'In a sense.', 'start': 4753.945, 'duration': 0.421}, {'end': 4760.375, 'text': 'So what it allows you to do is it allows you to run one statement or a set of statements multiple times.', 'start': 4755.007, 'duration': 5.368}, {'end': 4765.248, 'text': 'So we have something called the while loop in Python, which works like this.', 'start': 4761.567, 'duration': 3.681}, {'end': 4770.889, 'text': 'So you say while, and then you put in a condition and then as long as that condition is true.', 'start': 4765.848, 'duration': 5.041}], 'summary': 'Iteration in python allows running statements multiple times.', 'duration': 24.983, 'max_score': 4745.906, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI4745906.jpg'}, {'end': 5095.468, 'src': 'embed', 'start': 5065.684, 'weight': 2, 'content': [{'end': 5067.304, 'text': 'And when you run the cell now,', 'start': 5065.684, 'duration': 1.62}, {'end': 5072.765, 'text': "it's just going to keep on running continuously forever and you will not be able to execute any other code in the notebook.", 'start': 5067.304, 'duration': 5.461}, {'end': 5077.926, 'text': 'So this is called an infinite loop because the program is stuck in the loop forever.', 'start': 5073.125, 'duration': 4.801}, {'end': 5080.367, 'text': 'And the way to come out of this is to.', 'start': 5078.426, 'duration': 1.941}, {'end': 5084.532, 'text': 'interrupt this execution or to prevent, stop this execution.', 'start': 5081.227, 'duration': 3.305}, {'end': 5086.235, 'text': 'And there are a couple of ways to do that.', 'start': 5084.612, 'duration': 1.623}, {'end': 5093.946, 'text': "So you can go Colonel interrupt, and that's going to interrupt the loop, and then you can make the change and rerun the cell.", 'start': 5086.695, 'duration': 7.251}, {'end': 5095.468, 'text': 'And then the other option.', 'start': 5094.467, 'duration': 1.001}], 'summary': 'Infinite loop prevents code execution. interrupt or stop to resolve.', 'duration': 29.784, 'max_score': 5065.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI5065684.jpg'}, {'end': 5389.657, 'src': 'embed', 'start': 5362.575, 'weight': 1, 'content': [{'end': 5367.92, 'text': 'that is when you use a for loop, and they have a very, very nice, simple, intuitive syntax.', 'start': 5362.575, 'duration': 5.345}, {'end': 5371.563, 'text': 'So what we say is for value in sequence.', 'start': 5368.34, 'duration': 3.223}, {'end': 5373.445, 'text': 'for value in sequence.', 'start': 5372.424, 'duration': 1.021}, {'end': 5380.79, 'text': 'So we have a sequence and then we take using the in operator, we take one by one, each value from the sequence.', 'start': 5373.485, 'duration': 7.305}, {'end': 5387.175, 'text': 'once we put it into this for statement and then we can execute the a bunch of statements repeatedly.', 'start': 5381.131, 'duration': 6.044}, {'end': 5389.657, 'text': 'a bunch of statements for each value from that sequence.', 'start': 5387.175, 'duration': 2.482}], 'summary': 'For loops in python have a simple, intuitive syntax: for value in sequence. it allows executing statements repeatedly for each value from the sequence.', 'duration': 27.082, 'max_score': 5362.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI5362575.jpg'}, {'end': 5859.867, 'src': 'embed', 'start': 5833.225, 'weight': 3, 'content': [{'end': 5838.25, 'text': 'There is some logic, which you think can be used again and again on different data or needs to be.', 'start': 5833.225, 'duration': 5.025}, {'end': 5844.134, 'text': 'You do not want to write the same set of code each time you want to do that, perform that operation.', 'start': 5839.07, 'duration': 5.064}, {'end': 5849.579, 'text': 'So you convert it into a function, and a function takes one or more inputs,', 'start': 5844.595, 'duration': 4.984}, {'end': 5853.522, 'text': 'perform certain operations on those inputs and then often returns an output as well.', 'start': 5849.579, 'duration': 3.943}, {'end': 5859.867, 'text': 'Okay And Python provides many built-in functions like print Len and so on, but you can also define your own functions.', 'start': 5853.742, 'duration': 6.125}], 'summary': 'Python allows defining reusable functions to perform operations on inputs and return outputs.', 'duration': 26.642, 'max_score': 5833.225, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI5833225.jpg'}], 'start': 4010.482, 'title': 'Python programming fundamentals', 'summary': 'Covers a range of fundamental python programming concepts, including experiments with dictionaries, working with jovian ml, branching, loops, iteration, functions, and scope. it includes examples and tips, showcasing the efficiency of loops and the flexibility of functions, and provides solutions for potential pitfalls. the content suggests further learning resources for dictionaries and emphasizes the importance of avoiding infinite loops.', 'chapters': [{'end': 4048.082, 'start': 4010.482, 'title': 'Python dictionary experiments', 'summary': 'Introduces various experiments with dictionaries, including using the same key multiple times, creating a copy of a dictionary, changing the value associated with a key, and exploring hierarchical structures. it also suggests further learning resources.', 'duration': 37.6, 'highlights': ['The chapter introduces various experiments with dictionaries, including using the same key multiple times, creating a copy of a dictionary, changing the value associated with a key, and exploring hierarchical structures.', 'The chapter suggests trying out experiments such as using the same key multiple times, creating a copy of a dictionary, and changing the value associated with a key.', 'The chapter provides links to Python documentation and other tutorials for further learning about data types in Python.']}, {'end': 4745.346, 'start': 4048.082, 'title': 'Working with jovian ml and branching in python', 'summary': 'Covers the process of importing and running the jovian library to capture and upload a snapshot of the notebook onto jovian ml, along with an introduction to branching in python using if, else, and elif statements, demonstrating their functionality with examples and providing tips on nesting statements.', 'duration': 697.264, 'highlights': ['The chapter covers the process of importing and running the Jovian library to capture and upload a snapshot of the notebook onto Jovian ML', 'An introduction to branching in Python using if, else, and elif statements', 'Demonstrating the functionality of if, else, and elif statements with examples', 'Providing tips on nesting statements and avoiding deep nesting']}, {'end': 5041.97, 'start': 4745.906, 'title': 'Python loops and iteration', 'summary': 'Covers the concept of while loop in python, demonstrating how to calculate the factorial of a number using a while loop, highlighting the efficiency and power of loops in processing data with minimal lines of code and showcasing examples of patterns created using while loops.', 'duration': 296.064, 'highlights': ['The while loop in Python allows running a set of statements multiple times based on a given condition, exemplified by calculating the factorial of a number, such as 100, using a while loop, showcasing the efficiency of computation with minimal code lines and time taken (2 milliseconds).', 'Demonstrating the power of loops, the chapter emphasizes that computers can process a vast amount of data quickly with a few lines of code, making loops a powerful tool in programming.', 'The chapter provides exercises to practice while loops by creating patterns using asterisk characters, encouraging experimentation and understanding of loop concepts in Python.', 'Examples of patterns created using while loops, including a printed pattern, mirror image, and diamond pattern, serve as practical exercises to reinforce understanding of loops and iteration in Python programming.']}, {'end': 5529.442, 'start': 5043.071, 'title': 'Working with loops in python', 'summary': 'Discusses the importance of avoiding infinite loops in python, highlighting the potential pitfalls and providing solutions, and then delves into the usage of break and continue statements within while loops and the intuitive syntax of for loops.', 'duration': 486.371, 'highlights': ['The chapter emphasizes the importance of avoiding infinite loops in Python by explaining the potential consequences, such as the program getting stuck, preventing other code execution, and provides solutions through interrupting the execution or using the stop/interrupt button, ensuring smooth code execution and preventing notebook disruptions.', 'The detailed explanation of the break statement within a while loop showcases its functionality in stopping the loop based on a specific condition, illustrated through a practical example of breaking out of the loop when a certain value is reached, effectively demonstrating its utility in controlling loop execution.', 'The continuous statement within a while loop is elucidated, emphasizing its capability to skip remaining statements in the loop when executed based on a condition, demonstrated through an example revealing how it skips specific statements, providing a clear understanding of its use case and impact on loop execution.', "The chapter provides a comprehensive understanding of for loops, elucidating their usage for iterating over sequences like lists, tuples, strings, and dictionaries, and emphasizes the intuitive syntax 'for value in sequence', effectively enabling repetitive execution of statements for each value from the sequence, enhancing code readability and efficiency."]}, {'end': 6125.293, 'start': 5529.722, 'title': 'Python loops, functions, and scope', 'summary': 'Covers for loops, range function, break and continue statements, the pass statement, and functions in python, emphasizing the concept of functions as reusable sets of instructions and the flexibility of functions with arguments.', 'duration': 595.571, 'highlights': ['The chapter covers for loops, range function, break and continue statements, and the pass statement, emphasizing their usage and versatility.', 'Functions in Python are described as reusable sets of instructions and are shown to be flexible with arguments, allowing for the creation of functions that can perform the same operation on different values.']}], 'duration': 2114.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI4010482.jpg', 'highlights': ['The while loop in Python allows running a set of statements multiple times based on a given condition, exemplified by calculating the factorial of a number, such as 100, using a while loop, showcasing the efficiency of computation with minimal code lines and time taken (2 milliseconds).', "The chapter provides a comprehensive understanding of for loops, elucidating their usage for iterating over sequences like lists, tuples, strings, and dictionaries, and emphasizes the intuitive syntax 'for value in sequence', effectively enabling repetitive execution of statements for each value from the sequence, enhancing code readability and efficiency.", 'The chapter emphasizes the importance of avoiding infinite loops in Python by explaining the potential consequences, such as the program getting stuck, preventing other code execution, and provides solutions through interrupting the execution or using the stop/interrupt button, ensuring smooth code execution and preventing notebook disruptions.', 'Functions in Python are described as reusable sets of instructions and are shown to be flexible with arguments, allowing for the creation of functions that can perform the same operation on different values.', 'The chapter covers the process of importing and running the Jovian library to capture and upload a snapshot of the notebook onto Jovian ML']}, {'end': 7050.018, 'segs': [{'end': 6171.496, 'src': 'embed', 'start': 6145.308, 'weight': 0, 'content': [{'end': 6152.875, 'text': 'So the next thing we want to look at is how to write great functions in Python, because, as a programmer, you will be, and you should be,', 'start': 6145.308, 'duration': 7.567}, {'end': 6156.198, 'text': 'spending most of your time writing and using functions.', 'start': 6152.875, 'duration': 3.323}, {'end': 6161.662, 'text': 'The more functions you write, the better you get at programming because you then learn to structure.', 'start': 6156.978, 'duration': 4.684}, {'end': 6166.088, 'text': 'Okay what you need to do into small functions that you can reuse in different ways.', 'start': 6161.682, 'duration': 4.406}, {'end': 6171.496, 'text': 'Okay And we are going to explore how to write a great function and using the many features that Python offers.', 'start': 6166.309, 'duration': 5.187}], 'summary': 'Learn to write great functions in python for better programming. reuse small functions in different ways.', 'duration': 26.188, 'max_score': 6145.308, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6145308.jpg'}, {'end': 6282.207, 'src': 'embed', 'start': 6252.2, 'weight': 4, 'content': [{'end': 6254.203, 'text': "And then we'll build this function step-by-step.", 'start': 6252.2, 'duration': 2.003}, {'end': 6262.095, 'text': "So we'll also see how to build a good function and we'll use certain features of Python to make it very flexible and powerful.", 'start': 6254.584, 'duration': 7.511}, {'end': 6267.199, 'text': 'Okay. So the simplest thing that we can do is since we have to come, since we have to compare EMIs,', 'start': 6262.476, 'duration': 4.723}, {'end': 6271.161, 'text': 'we need to calculate these EMIs or monthly installments for both of these loans.', 'start': 6267.199, 'duration': 3.962}, {'end': 6279.266, 'text': "So it'd be helpful to define a function so that we do not have to type out the same logic for each of these loans.", 'start': 6271.582, 'duration': 7.684}, {'end': 6279.646, 'text': 'All right.', 'start': 6279.366, 'duration': 0.28}, {'end': 6282.207, 'text': "So we're going to define a function called the loan EMI.", 'start': 6280.066, 'duration': 2.141}], 'summary': 'Building a flexible and powerful function in python to calculate emis for loans.', 'duration': 30.007, 'max_score': 6252.2, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6252200.jpg'}, {'end': 6510.541, 'src': 'embed', 'start': 6482.466, 'weight': 1, 'content': [{'end': 6484.929, 'text': 'We pass in eight into 12, those many months.', 'start': 6482.466, 'duration': 2.463}, {'end': 6487.791, 'text': 'And for the 10 year loan, we pass in one 120 months.', 'start': 6485.009, 'duration': 2.782}, {'end': 6490.494, 'text': 'We still are not considering the down payment.', 'start': 6488.292, 'duration': 2.202}, {'end': 6498.161, 'text': 'So you can see that the EMI, obviously for the 10 year loan will be smaller than that for the eight year loan, as you might expect.', 'start': 6490.954, 'duration': 7.207}, {'end': 6500.816, 'text': 'Okay And now this is great.', 'start': 6499.375, 'duration': 1.441}, {'end': 6505.858, 'text': 'We can see visually that these values are different, but it would be nice to compare them as numbers.', 'start': 6501.036, 'duration': 4.822}, {'end': 6508.42, 'text': 'And maybe we want to calculate the difference and so on.', 'start': 6506.219, 'duration': 2.201}, {'end': 6510.541, 'text': 'And that is where we can actually use a return value.', 'start': 6508.5, 'duration': 2.041}], 'summary': 'Comparison of emis for 8-year and 10-year loans, with visual and numerical analysis.', 'duration': 28.075, 'max_score': 6482.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6482466.jpg'}, {'end': 6565.38, 'src': 'embed', 'start': 6539.496, 'weight': 2, 'content': [{'end': 6545.62, 'text': 'So now we get an EMI one EMI two, and we can now take a difference of these two and we can check.', 'start': 6539.496, 'duration': 6.124}, {'end': 6546.24, 'text': 'Okay Okay.', 'start': 6545.74, 'duration': 0.5}, {'end': 6547.421, 'text': "There's a difference of $2, 000 between the two EMI's.", 'start': 6546.26, 'duration': 1.161}, {'end': 6556.055, 'text': "Again, right now we are not considering down payment or interest, but it's a good, it's a good thing to see how things are evolving.", 'start': 6549.782, 'duration': 6.273}, {'end': 6565.38, 'text': "Okay So next up, let us add the down payment, the immediate down payment, the amount that you're going to pay right now that needs to be deducted.", 'start': 6557.136, 'duration': 8.244}], 'summary': 'The difference between two emis is $2,000, excluding down payment and interest.', 'duration': 25.884, 'max_score': 6539.496, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6539496.jpg'}, {'end': 6631.243, 'src': 'embed', 'start': 6599.6, 'weight': 3, 'content': [{'end': 6604.404, 'text': 'So here we say we pass in the loan amount, we pass in the duration, and then we pass in the down payment.', 'start': 6599.6, 'duration': 4.804}, {'end': 6608.286, 'text': "That's the first loan, the eight year loan with a down payment of 300, 000.", 'start': 6604.984, 'duration': 3.302}, {'end': 6611.908, 'text': 'And now we get back a $10, 000.', 'start': 6608.286, 'duration': 3.622}, {'end': 6615.251, 'text': 'We get back $10, 000 EMI, but on the other hand,', 'start': 6611.909, 'duration': 3.342}, {'end': 6624.218, 'text': "here what we have is we've just passed in the loan amount and we just passed in the amount and duration and we've not passed in a third argument.", 'start': 6615.251, 'duration': 8.967}, {'end': 6631.243, 'text': 'So when we do not pass it in Python takes and converts this, Python simply uses a default value down payment equals zero.', 'start': 6624.278, 'duration': 6.965}], 'summary': 'Python calculates emi, default down payment is $0.', 'duration': 31.643, 'max_score': 6599.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6599600.jpg'}, {'end': 6874.141, 'src': 'embed', 'start': 6842.335, 'weight': 5, 'content': [{'end': 6843.136, 'text': 'This is completely wrong.', 'start': 6842.335, 'duration': 0.801}, {'end': 6847.966, 'text': 'So the way to avoid that is to use something called named arguments.', 'start': 6844.284, 'duration': 3.682}, {'end': 6854.509, 'text': 'Okay So while invoking a function with many arguments, it can make, it can get confusing and there can be human errors.', 'start': 6848.306, 'duration': 6.203}, {'end': 6860.853, 'text': 'So what you can do is you can specify the name of the argument before the actual value that you pass in.', 'start': 6854.869, 'duration': 5.984}, {'end': 6866.215, 'text': 'So you can say loan EMI, and then I have written out each argument on a separate line and you can do that.', 'start': 6860.913, 'duration': 5.302}, {'end': 6869.097, 'text': 'You can split the function invocation into separate lines.', 'start': 6866.255, 'duration': 2.842}, {'end': 6874.141, 'text': "But you don't have to, you can write it all on the same line as well as I've done for the second case.", 'start': 6869.617, 'duration': 4.524}], 'summary': 'Using named arguments in function invocation can prevent confusion and human errors.', 'duration': 31.806, 'max_score': 6842.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6842335.jpg'}, {'end': 6994.783, 'src': 'embed', 'start': 6967.338, 'weight': 7, 'content': [{'end': 6971.021, 'text': 'So functions within functions, using functions within functions is a very powerful technique.', 'start': 6967.338, 'duration': 3.683}, {'end': 6976.846, 'text': "you can try this out and it'll be a good exercise, but since this is such a common thing, rounding numbers up and down.", 'start': 6971.442, 'duration': 5.404}, {'end': 6979.329, 'text': 'Python provides a built-in function for it.', 'start': 6977.587, 'duration': 1.742}, {'end': 6984.194, 'text': 'except that this built-in function is part of the Python standard library.', 'start': 6980.35, 'duration': 3.844}, {'end': 6986.616, 'text': 'So because there are a lot of.', 'start': 6984.654, 'duration': 1.962}, {'end': 6994.783, 'text': 'Python is a general purpose language that is applied to many different use cases in scientific computing and data analysis in software development.', 'start': 6986.616, 'duration': 8.167}], 'summary': "Python's powerful technique of using functions within functions is common and part of its standard library, making it a versatile language for scientific computing, data analysis, and software development.", 'duration': 27.445, 'max_score': 6967.338, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6967338.jpg'}, {'end': 7054.599, 'src': 'embed', 'start': 7026.854, 'weight': 6, 'content': [{'end': 7029.255, 'text': 'And what they do is they give you a way of organizing.', 'start': 7026.854, 'duration': 2.401}, {'end': 7033.659, 'text': 'large Python projects into files and folders.', 'start': 7030.552, 'duration': 3.107}, {'end': 7038.57, 'text': 'And the key benefit that modules offer is a namespaces.', 'start': 7034.802, 'duration': 3.768}, {'end': 7044.116, 'text': "that is when you, When you want to use something from a module, and we'll see an example very quickly.", 'start': 7038.57, 'duration': 5.546}, {'end': 7050.018, 'text': 'You need to first import the module and then all the methods, all the functions,', 'start': 7044.676, 'duration': 5.342}, {'end': 7054.599, 'text': 'everything inside a module will have to be accessed using the name of the module.', 'start': 7050.018, 'duration': 4.581}], 'summary': 'Modules organize large python projects into files and folders, providing namespaces for accessing functions and methods.', 'duration': 27.745, 'max_score': 7026.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI7026854.jpg'}], 'start': 6125.713, 'title': 'Python functions, loan emi calculation, and organization', 'summary': 'Covers the importance of python functions, demonstrates the creation of a loan emi calculation function, compares eight-year and ten-year loans with a $2,000 difference in emis, explains loan emi calculation in python, and discusses the use of named arguments and function organization.', 'chapters': [{'end': 6460.958, 'start': 6125.713, 'title': 'Python functions and loan emi calculation', 'summary': 'Discusses the importance of writing and using functions in python, and then proceeds to demonstrate the step-by-step creation of a loan emi calculation function using local variables and scope rules.', 'duration': 335.245, 'highlights': ['The chapter discusses the importance of writing and using functions in Python, highlighting that spending more time writing and using functions improves programming skills.', 'It presents a problem of calculating EMIs for different loan options, specifically comparing an eight-year loan with 10% interest and a 10-year loan with 8% interest for a $1.26 million house purchase.', 'It demonstrates the step-by-step creation of a loan EMI calculation function, starting with simplifying the function to calculate monthly installments for a one-year duration and then extending it to include the loan duration in months, using local variables and explaining scope rules.', 'It explains the concept of local variables and scope rules, emphasizing that variables defined within a function are only accessible within that function.']}, {'end': 6637.714, 'start': 6460.998, 'title': 'Loan comparison and calculation', 'summary': 'Discusses the comparison between eight-year and ten-year loans, including the calculation of emis and the effect of down payment on loan amount, with a mention of a $2,000 difference in emis.', 'duration': 176.716, 'highlights': ['The EMI for the 10-year loan is smaller than that for the 8-year loan, as expected.', 'Using a return value, the difference between the EMIs of the two loans is calculated to be $2,000.', 'The down payment is considered, resulting in a $10,000 EMI for the 8-year loan with a $300,000 down payment.']}, {'end': 6822.114, 'start': 6637.994, 'title': 'Calculating loan emi in python', 'summary': 'Discusses the calculation of loan emi using python, including the formula and the derivation of the formula, and the implementation of the formula in python to calculate the equal monthly installment, with examples showing the impact of different parameters on the emi.', 'duration': 184.12, 'highlights': ['The chapter discusses the calculation of loan EMI using Python, including the formula and the derivation of the formula, and the implementation of the formula in Python to calculate the equal monthly installment.', 'Examples showing the impact of different parameters on the EMI are provided.', 'Option one has the lower EMI, allowing users to make informed decisions.']}, {'end': 7050.018, 'start': 6823.365, 'title': 'Named arguments and function organization', 'summary': 'Discusses the use of named arguments to avoid confusion and errors when invoking functions with multiple arguments, and it emphasizes the organization of functions into modules to avoid namespace collisions, which is critical for large python projects.', 'duration': 226.653, 'highlights': ['Named arguments can be used to specify the name of the argument before the actual value, allowing for clearer function invocation and the ability to change the order of arguments.', 'The use of functions within functions is a powerful technique, and Python provides a built-in function for rounding numbers up and down, which is part of the Python standard library.', 'Organizing functions into modules helps avoid namespace collisions, provides a way of structuring large Python projects, and offers the benefit of namespaces.']}], 'duration': 924.305, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI6125713.jpg', 'highlights': ['The chapter discusses the importance of writing and using functions in Python, emphasizing improved programming skills.', 'The EMI for the 10-year loan is smaller than that for the 8-year loan, as expected.', 'Using a return value, the difference between the EMIs of the two loans is calculated to be $2,000.', 'The down payment is considered, resulting in a $10,000 EMI for the 8-year loan with a $300,000 down payment.', 'The chapter discusses the calculation of loan EMI using Python, including the formula and the derivation of the formula.', 'Named arguments can be used to specify the name of the argument before the actual value, allowing for clearer function invocation and the ability to change the order of arguments.', 'Organizing functions into modules helps avoid namespace collisions, provides a way of structuring large Python projects, and offers the benefit of namespaces.', 'The use of functions within functions is a powerful technique, and Python provides a built-in function for rounding numbers up and down.']}, {'end': 8374.432, 'segs': [{'end': 7085.823, 'src': 'embed', 'start': 7050.018, 'weight': 0, 'content': [{'end': 7054.599, 'text': 'everything inside a module will have to be accessed using the name of the module.', 'start': 7050.018, 'duration': 4.581}, {'end': 7059.701, 'text': "So that makes sure that your global namespace that you're working with do not get, does not get affected.", 'start': 7054.639, 'duration': 5.062}, {'end': 7064.362, 'text': 'And that allows people like if you have 20 people or a hundred people working on a project.', 'start': 7060.101, 'duration': 4.261}, {'end': 7067.765, 'text': 'Everybody can write their own modules and they can use the same variable names.', 'start': 7064.862, 'duration': 2.903}, {'end': 7072.551, 'text': 'They can use the same function names, and that will not cause problems when you want to use those modules together.', 'start': 7067.805, 'duration': 4.746}, {'end': 7080.339, 'text': "Okay So here's an, by the way, you can write your own modules, but we are going to use some built-in modules from the Python standard library.", 'start': 7072.911, 'duration': 7.428}, {'end': 7085.823, 'text': "Okay So I'm going to import the math module that contains a lot of math related operations.", 'start': 7080.779, 'duration': 5.044}], 'summary': "Using modules in python ensures namespace isolation, enabling multiple people to work on projects without variable or function name conflicts. also, leveraging built-in modules like 'math' for various math operations.", 'duration': 35.805, 'max_score': 7050.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI7050018.jpg'}, {'end': 7403.524, 'src': 'embed', 'start': 7373.122, 'weight': 3, 'content': [{'end': 7375.984, 'text': "Okay Now let's suppose that there was no interest on this.", 'start': 7373.122, 'duration': 2.862}, {'end': 7380.187, 'text': 'So what we can do is we can simply get the EMI for the loan without interest.', 'start': 7376.365, 'duration': 3.822}, {'end': 7385.771, 'text': 'And then if we subtract the two EMIs, we get to know how much interest we are paying per month.', 'start': 7380.607, 'duration': 5.164}, {'end': 7388.633, 'text': 'And then we simply take that over the entire duration.', 'start': 7386.211, 'duration': 2.422}, {'end': 7390.414, 'text': 'So that gives us the total interest rates.', 'start': 7388.673, 'duration': 1.741}, {'end': 7391.534, 'text': "That's one way of solving it.", 'start': 7390.474, 'duration': 1.06}, {'end': 7392.435, 'text': 'There are other ways too.', 'start': 7391.595, 'duration': 0.84}, {'end': 7398.239, 'text': "So let's put in the amount, a hundred thousand, let's put in the duration and let us put in the rate of interest as zero.", 'start': 7393.015, 'duration': 5.224}, {'end': 7403.524, 'text': "Oops, but something seems to have gone wrong here and let's see what went wrong.", 'start': 7399.301, 'duration': 4.223}], 'summary': 'Calculating total interest paid by subtracting emis without interest from emis with interest over the entire duration.', 'duration': 30.402, 'max_score': 7373.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI7373122.jpg'}, {'end': 7949.111, 'src': 'embed', 'start': 7919.694, 'weight': 2, 'content': [{'end': 7921.816, 'text': 'using named arguments while invoking a function.', 'start': 7919.694, 'duration': 2.122}, {'end': 7928.44, 'text': 'importing modules and using the library functions, reusing and improving functions to handle a new use cases.', 'start': 7922.376, 'duration': 6.064}, {'end': 7931.942, 'text': 'So what we saw was we kept improving the loan EMI function over time.', 'start': 7928.52, 'duration': 3.422}, {'end': 7935.484, 'text': 'And this is something that you should do a lot, keep improving your functions.', 'start': 7932.342, 'duration': 3.142}, {'end': 7938.466, 'text': 'And we handled exceptions using try accept.', 'start': 7935.985, 'duration': 2.481}, {'end': 7941.128, 'text': 'And then finally we documented functions using docstring.', 'start': 7938.786, 'duration': 2.342}, {'end': 7943.789, 'text': "So that's a lot of ground that we've covered in just 45 minutes.", 'start': 7941.168, 'duration': 2.621}, {'end': 7946.81, 'text': "So I hope you've been able to follow.", 'start': 7945.41, 'duration': 1.4}, {'end': 7949.111, 'text': 'And if not, you have this notebook.', 'start': 7947.311, 'duration': 1.8}], 'summary': 'Improved loan emi function, handled exceptions, documented functions in 45 minutes.', 'duration': 29.417, 'max_score': 7919.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI7919694.jpg'}, {'end': 8045.384, 'src': 'embed', 'start': 8016.188, 'weight': 4, 'content': [{'end': 8020.971, 'text': 'So all of these questions that you might have, the more curious you are, the more you break things, the more you learn.', 'start': 8016.188, 'duration': 4.783}, {'end': 8027.274, 'text': "And the best way to do that is to just type out the code yourself, because, as you're typing, you will wonder what can I change here??", 'start': 8021.431, 'duration': 5.843}, {'end': 8028.534, 'text': 'What can I break here?', 'start': 8027.614, 'duration': 0.92}, {'end': 8033.497, 'text': 'And the more you explore, the more you learn about Python, the better a programmer you will become.', 'start': 8029.014, 'duration': 4.483}, {'end': 8035.838, 'text': 'Okay All right.', 'start': 8034.177, 'duration': 1.661}, {'end': 8037.819, 'text': 'So now we have an exercise for you.', 'start': 8036.098, 'duration': 1.721}, {'end': 8039.801, 'text': 'This is not an assignment or anything.', 'start': 8038.22, 'duration': 1.581}, {'end': 8041.201, 'text': 'This is just for you to understand.', 'start': 8039.821, 'duration': 1.38}, {'end': 8045.384, 'text': "So here you're planning a leisure trip and you need to decide which city do you want to visit?", 'start': 8041.702, 'duration': 3.682}], 'summary': 'Curiosity and exploration lead to learning; typing code, breaking it, and exploring python improves programming skills.', 'duration': 29.196, 'max_score': 8016.188, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI8016188.jpg'}], 'start': 7050.018, 'title': 'Python module usage, math functions, exception handling, and practical exercises', 'summary': 'Covers the usage of modules in python for preventing namespace conflicts, importing built-in modules like math, using the math module for rounding and loan emi calculation, handling exceptions with try-except statements, and maximizing learning through practical coding exercises and forum participation.', 'chapters': [{'end': 7085.823, 'start': 7050.018, 'title': 'Module usage in python', 'summary': 'Explains the usage of modules in python, emphasizing how they prevent namespace conflicts and enable collaboration among multiple developers by allowing the use of the same variable and function names. it also introduces the concept of importing built-in modules from the python standard library, specifically the math module for math-related operations.', 'duration': 35.805, 'highlights': ['Modules in Python prevent namespace conflicts and enable collaboration among multiple developers by allowing the use of the same variable and function names.', 'Importing built-in modules from the Python standard library, such as the math module, provides access to a variety of math-related operations.']}, {'end': 7392.435, 'start': 7086.323, 'title': 'Math module and loan emi function', 'summary': "Discusses using the math module's seal function for rounding up numbers and implementing a loan emi function to calculate monthly payments for different loans, showcasing the use of functions to solve diverse financial problems.", 'duration': 306.112, 'highlights': ["Implementing the math module's seal function for rounding up numbers and updating the loan EMI function to incorporate this rounding up process, showcasing the practical application of functions in financial calculations", 'Utilizing a generic function to solve diverse financial problems by creating a loan EMI function that can be applied to different loan scenarios, thereby eliminating the need to remember complex formulas and rounding processes', 'Solving multiple loan scenarios with minimal lines of code, such as determining the monthly payments for a home loan and a car loan by applying the loan EMI function, showcasing the efficiency of the function in handling different financial computations', 'Applying a computational approach to determine the total interest paid on a loan by comparing the EMIs with and without interest, showcasing a methodical approach to calculating additional interest payments']}, {'end': 8016.148, 'start': 7393.015, 'title': 'Handling exceptions in python', 'summary': 'Explains how to handle exceptions in python using try-except statements, and demonstrates the process with a loan emi calculation function, highlighting the zero division error and the use of docstrings for function documentation.', 'duration': 623.133, 'highlights': ['The chapter explains how to handle exceptions in Python using try-except statements', 'Demonstrates the process with a loan EMI calculation function, highlighting the zero division error', 'Use of docstrings for function documentation']}, {'end': 8374.432, 'start': 8016.188, 'title': 'Maximizing learning through practical python exercises', 'summary': 'Emphasizes the importance of practical coding exercises, such as planning a trip using python to minimize expenses and maximize the trip duration, along with the significance of answering and asking questions on the forum to enhance learning.', 'duration': 358.244, 'highlights': ['The chapter emphasizes the importance of practical coding exercises, such as planning a trip using Python to minimize expenses and maximize the trip duration.', 'The significance of answering and asking questions on the forum to enhance learning is highlighted.', 'The chapter introduces the NumPy library for numerical computing in Python, providing specialized data structures and functions for working with numerical data.']}], 'duration': 1324.414, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI7050018.jpg', 'highlights': ['Importing built-in modules from the Python standard library, such as the math module, provides access to a variety of math-related operations.', 'Modules in Python prevent namespace conflicts and enable collaboration among multiple developers by allowing the use of the same variable and function names.', 'Utilizing a generic function to solve diverse financial problems by creating a loan EMI function that can be applied to different loan scenarios, thereby eliminating the need to remember complex formulas and rounding processes', 'Applying a computational approach to determine the total interest paid on a loan by comparing the EMIs with and without interest, showcasing a methodical approach to calculating additional interest payments', 'The chapter emphasizes the importance of practical coding exercises, such as planning a trip using Python to minimize expenses and maximize the trip duration.']}, {'end': 9968.356, 'segs': [{'end': 8546.787, 'src': 'embed', 'start': 8519.733, 'weight': 0, 'content': [{'end': 8528.477, 'text': 'So here the yield of apples in canto becomes the temperature times w1 plus the rainfall times w2 plus the humidity times w3.', 'start': 8519.733, 'duration': 8.744}, {'end': 8534.219, 'text': 'And once we do that, we get back the result 56.8.', 'start': 8529.317, 'duration': 4.902}, {'end': 8538.681, 'text': 'And we can print it out that the expected yield of apples in the canto region is 56.8 tons per hectare.', 'start': 8534.219, 'duration': 4.462}, {'end': 8546.787, 'text': "Okay, so far it's a pretty simple calculation that we've done with a bunch of variables,", 'start': 8541.764, 'duration': 5.023}], 'summary': 'The expected yield of apples in the canto region is 56.8 tons per hectare.', 'duration': 27.054, 'max_score': 8519.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI8519733.jpg'}, {'end': 9096.884, 'src': 'embed', 'start': 9071.69, 'weight': 4, 'content': [{'end': 9076.733, 'text': 'concise and intuitive mathematical expressions like Canto, star weights, dot sum.', 'start': 9071.69, 'duration': 5.043}, {'end': 9078.941, 'text': 'and get the same results.', 'start': 9077.941, 'duration': 1}, {'end': 9088.203, 'text': 'But another important reason to use NumPy is performance, because NumPy operations and functions are implemented internally using C plus,', 'start': 9079.381, 'duration': 8.822}, {'end': 9094.924, 'text': 'and Python offers a way where you can write something in C plus and expose it or connect it with a Python function.', 'start': 9088.203, 'duration': 6.721}, {'end': 9096.884, 'text': "So there's an interface available.", 'start': 9095.364, 'duration': 1.52}], 'summary': 'Numpy provides performance due to c implementation and interface with python functions.', 'duration': 25.194, 'max_score': 9071.69, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI9071690.jpg'}, {'end': 9235.43, 'src': 'embed', 'start': 9185.51, 'weight': 1, 'content': [{'end': 9191.079, 'text': 'And to do that over two vectors of size 1 million that took about 200 milliseconds.', 'start': 9185.51, 'duration': 5.569}, {'end': 9196.327, 'text': "Now let's do the same thing, but this time let us use a numpy array.", 'start': 9191.76, 'duration': 4.567}, {'end': 9204.991, 'text': 'So we have the same NumPy arrays and then we run it just use NP dot dot, and that just takes 2.4 milliseconds.', 'start': 9197.307, 'duration': 7.684}, {'end': 9212.815, 'text': "So that's where you can see, we get the same result back, but the NP dot dot is a hundred times faster than using a for loop.", 'start': 9205.651, 'duration': 7.164}, {'end': 9221.52, 'text': "And this is what makes NumPy especially useful when you're working with really large data sets with tens of thousands or even millions of data points.", 'start': 9213.436, 'duration': 8.084}, {'end': 9222.959, 'text': 'All right.', 'start': 9222.639, 'duration': 0.32}, {'end': 9224.561, 'text': "That's a benefit of NumPy.", 'start': 9223.32, 'duration': 1.241}, {'end': 9227.523, 'text': 'Now, before we go ahead, let us just save our work.', 'start': 9225.101, 'duration': 2.422}, {'end': 9232.688, 'text': "So I'm just going to import the Jovian library and run jovian.com.", 'start': 9227.704, 'duration': 4.984}, {'end': 9235.43, 'text': 'And this will ask us for an API key.', 'start': 9233.769, 'duration': 1.661}], 'summary': 'Using numpy arrays for calculations is 100 times faster than using for loops, as demonstrated with vectors of size 1 million taking 200 milliseconds with for loops and 2.4 milliseconds with numpy arrays.', 'duration': 49.92, 'max_score': 9185.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI9185510.jpg'}], 'start': 8375.433, 'title': 'Using numpy for predicting crop yield and efficient calculations', 'summary': 'Covers the use of numpy to predict crop yield, achieving an approximate yield of 56.8 tons per hectare for the kanto region. it also demonstrates efficiency through a hundredfold speed increase in calculations and showcases the flexibility, benefits, and power of numpy in simplifying complex operations.', 'chapters': [{'end': 8760.274, 'start': 8375.433, 'title': 'Predicting crop yield with numpy', 'summary': 'Introduces the use of numpy to predict the yield of apples in different regions based on climate data by formulating a linear equation and using a set of weights, and demonstrates the implementation through a function, resulting in an approximate yield of 56.8 tons per hectare for the kanto region using specific climate data and weights.', 'duration': 384.841, 'highlights': ['The chapter introduces the use of NumPy to predict the yield of apples in different regions based on climate data by formulating a linear equation and using a set of weights.', 'The implementation demonstrates the process through a function, resulting in an approximate yield of 56.8 tons per hectare for the Kanto region using specific climate data and weights.', 'The function facilitates the calculation of crop yield by performing element-wise multiplication and addition based on the given climate data and weights.']}, {'end': 9212.815, 'start': 8760.855, 'title': 'Numpy arrays and dot product', 'summary': 'Discusses the use of numpy arrays to perform element-wise multiplication and dot product calculations, demonstrating the efficiency of numpy operations with a specific example showing a hundredfold increase in speed when using np dot dot over a for loop.', 'duration': 451.96, 'highlights': ['The NP dot dot function in NumPy allows for the calculation of the dot product of two vectors, providing a more efficient and concise way of performing this operation compared to traditional methods, as demonstrated by achieving the same result with even lower level operations.', 'NumPy operations and functions are implemented internally using C plus, resulting in significantly faster calculations, as illustrated by a specific example where performing the dot product using a for loop took about 200 milliseconds, while using NP dot dot with NumPy arrays took only 2.4 milliseconds, showcasing a hundredfold increase in speed.', 'The chapter introduces the concept of NumPy arrays, highlighting their ability to support arithmetic operators and methods such as dot sum, allowing for convenient element-wise multiplication and summation of arrays, providing a more intuitive and efficient approach compared to Python lists.']}, {'end': 9552.286, 'start': 9213.436, 'title': "Numpy's flexibility and benefits", 'summary': 'Discusses the flexibility and benefits of numpy, including its ability to handle large datasets, capture snapshots of notebooks, and create multi-dimensional arrays, with examples of 1d, 2d, and 3d arrays, and a detailed explanation of how to inspect the length along each dimension using the dot shape property of an array.', 'duration': 338.85, 'highlights': ["The chapter explains NumPy's flexibility in handling large datasets with tens of thousands or even millions of data points, capturing snapshots of notebooks using the Jovian library, and creating multi-dimensional arrays to represent climate data for all regions, with examples of 1D, 2D, and 3D arrays. (Relevance: 5)", 'The chapter provides a detailed explanation of inspecting the length along each dimension using the dot shape property of an array, including examples of how to arrive at the value of each dimension and the shape of 1D, 2D, and 3D arrays. (Relevance: 4)', 'The chapter demonstrates the creation of a two-dimensional NumPy array to represent climate data for all regions, with each element in the list representing a data point about the weather, and explains the concept of a matrix or table of values, with five rows and three columns. (Relevance: 3)']}, {'end': 9968.356, 'start': 9553.413, 'title': 'Working with numpy arrays and matrix multiplication', 'summary': 'Explains the importance of having the same number of elements in numpy arrays, the significance of data types for efficiency and performance, and the process of matrix multiplication using numpy, ultimately showcasing the power of numpy in simplifying complex operations.', 'duration': 414.943, 'highlights': ['The importance of having the same number of elements in NumPy arrays', 'Significance of data types for efficiency and performance', 'Process of matrix multiplication using NumPy']}], 'duration': 1592.923, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI8375433.jpg', 'highlights': ['Approximate yield of 56.8 tons per hectare for Kanto region', 'Hundredfold speed increase in calculations using NumPy', "NumPy's flexibility in handling large datasets", 'Efficient matrix multiplication using NumPy', "Efficiency through NumPy's operations and functions"]}, {'end': 11504.019, 'segs': [{'end': 10017.438, 'src': 'embed', 'start': 9968.416, 'weight': 1, 'content': [{'end': 9972.298, 'text': 'So we just want to download it in the same directory as the Jupyter notebook.', 'start': 9968.416, 'duration': 3.882}, {'end': 9975.661, 'text': "So I'm just going to give the file name climate.txt.", 'start': 9972.859, 'duration': 2.802}, {'end': 9987.384, 'text': 'So now the file should be downloaded and you can check this by going file open, and you should be able to see here that you have climate.txt here.', 'start': 9976.892, 'duration': 10.492}, {'end': 9994.131, 'text': 'If you click on climate.txt here, you will find that this is the, as I mentioned, this is the header.', 'start': 9987.944, 'duration': 6.187}, {'end': 9998.036, 'text': 'It describes what each column contains, and these are all the data points.', 'start': 9994.712, 'duration': 3.324}, {'end': 10005.247, 'text': 'And there are about 10, 000, you can verify it, but a better way is to just load it up as an umpire.', 'start': 9999.161, 'duration': 6.086}, {'end': 10010.852, 'text': "So when CSV file contains all numbers, it's very easy to load it up as an umpire.", 'start': 10005.867, 'duration': 4.985}, {'end': 10014.836, 'text': 'So the way to do that is using the Jane from TXT function.', 'start': 10011.452, 'duration': 3.384}, {'end': 10017.438, 'text': 'So np.gen from TXT.', 'start': 10015.436, 'duration': 2.002}], 'summary': "File 'climate.txt' with 10,000 data points downloaded and loaded using np.genfromtxt.", 'duration': 49.022, 'max_score': 9968.416, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI9968416.jpg'}, {'end': 10106.096, 'src': 'embed', 'start': 10079.974, 'weight': 0, 'content': [{'end': 10085.636, 'text': 'But if you check the shape, you will find that there are 10, 000 rows and each row contains three elements.', 'start': 10079.974, 'duration': 5.662}, {'end': 10087.897, 'text': 'So temperature, rainfall, and humidity.', 'start': 10086.076, 'duration': 1.821}, {'end': 10097.116, 'text': 'So we can now use a matrix multiplication operator to predict the yield of apples for the entire dataset using a given set of weights.', 'start': 10089.028, 'duration': 8.088}, {'end': 10101.96, 'text': "So let's take these weights 0.3 0.2 0.5, the same weights that we had earlier.", 'start': 10097.256, 'duration': 4.704}, {'end': 10103.562, 'text': 'And let us just.', 'start': 10102.681, 'duration': 0.881}, {'end': 10106.096, 'text': 'So let us just multiply it.', 'start': 10104.775, 'duration': 1.321}], 'summary': 'Using 10,000 rows of data, a matrix multiplication operator is used to predict apple yield with weights 0.3, 0.2, 0.5.', 'duration': 26.122, 'max_score': 10079.974, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI10079974.jpg'}, {'end': 10179.306, 'src': 'embed', 'start': 10150.987, 'weight': 3, 'content': [{'end': 10155.49, 'text': 'And then we can write it back to a file, right? So this is the common flow that you get some data file.', 'start': 10150.987, 'duration': 4.503}, {'end': 10162.956, 'text': 'You pull it in, you perform some analysis, you perform some operations, you get some results and then you take those results,', 'start': 10155.57, 'duration': 7.386}, {'end': 10165.097, 'text': 'create some output and write it back to a file.', 'start': 10162.956, 'duration': 2.141}, {'end': 10166.619, 'text': 'So to do that.', 'start': 10166.018, 'duration': 0.601}, {'end': 10172.344, 'text': 'Now to add the yields back to the climate data as a fourth column.', 'start': 10167.543, 'duration': 4.801}, {'end': 10176.045, 'text': 'So we are going to use the numpy.concatenate function.', 'start': 10172.845, 'duration': 3.2}, {'end': 10179.306, 'text': "Okay So I'm going to call numpy.concatenate.", 'start': 10176.286, 'duration': 3.02}], 'summary': 'Common flow: data import, analysis, operations, results, output creation, file write. adding yields as fourth column using numpy.concatenate.', 'duration': 28.319, 'max_score': 10150.987, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI10150987.jpg'}, {'end': 10442.229, 'src': 'embed', 'start': 10419.894, 'weight': 2, 'content': [{'end': 10427.919, 'text': 'So as I just mentioned, NumPy provides hundreds of functions for performing these operations on arrays, and here are some common functions.', 'start': 10419.894, 'duration': 8.025}, {'end': 10430.621, 'text': 'So for mathematics, you will find NumPy dot sum.', 'start': 10427.939, 'duration': 2.682}, {'end': 10438.667, 'text': 'So you can not only sum the entire array, but you can also sum just along a particular dimension or along one or more dimensions.', 'start': 10431.042, 'duration': 7.625}, {'end': 10442.229, 'text': 'Then you have numpy dot exponent numpy dot round.', 'start': 10439.167, 'duration': 3.062}], 'summary': 'Numpy offers hundreds of functions, including dot sum and exponent for array operations.', 'duration': 22.335, 'max_score': 10419.894, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI10419894.jpg'}, {'end': 10492.746, 'src': 'embed', 'start': 10464.511, 'weight': 7, 'content': [{'end': 10467.916, 'text': 'And it can sometimes be hard to find exactly what you need.', 'start': 10464.511, 'duration': 3.405}, {'end': 10472.724, 'text': 'So the easiest way to find the right function is to do a web search.', 'start': 10468.497, 'duration': 4.227}, {'end': 10477.13, 'text': 'So just do a search on any search engine, Google, DuckDuckGo.', 'start': 10473.425, 'duration': 3.705}, {'end': 10484.755, 'text': 'wherever, and try to formulate what you want to do into a single line as clearly as possible.', 'start': 10478.186, 'duration': 6.569}, {'end': 10489.201, 'text': 'So for, for example, what I searched was how to join NumPy arrays.', 'start': 10484.875, 'duration': 4.326}, {'end': 10492.746, 'text': 'And that led me to a tutorial on array concatenation.', 'start': 10489.762, 'duration': 2.984}], 'summary': 'Use web search to find the right function for specific tasks, e.g. how to join numpy arrays.', 'duration': 28.235, 'max_score': 10464.511, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI10464511.jpg'}, {'end': 10684.157, 'src': 'embed', 'start': 10656.33, 'weight': 6, 'content': [{'end': 10660.574, 'text': 'So you can take array two and you can find the remainder of array two with a number like four.', 'start': 10656.33, 'duration': 4.244}, {'end': 10662.836, 'text': 'And these are all the remainders.', 'start': 10661.575, 'duration': 1.261}, {'end': 10670.183, 'text': "Now that's great, but NumPy arrays also support something called broadcasting.", 'start': 10663.977, 'duration': 6.206}, {'end': 10679.814, 'text': 'And what that does is that it allows arithmetic operations between two areas having a different number of dimensions right?', 'start': 10671.289, 'duration': 8.525}, {'end': 10684.157, 'text': 'So they can have a different number of dimensions, but their shapes must be compatible.', 'start': 10680.214, 'duration': 3.943}], 'summary': 'Numpy arrays support broadcasting for arithmetic operations with different dimensions.', 'duration': 27.827, 'max_score': 10656.33, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI10656330.jpg'}, {'end': 11081.685, 'src': 'embed', 'start': 11051.915, 'weight': 9, 'content': [{'end': 11058.016, 'text': "So it extends Python's indexing notation to multiple dimensions and it does so in a fairly intuitive fashion.", 'start': 11051.915, 'duration': 6.101}, {'end': 11065.858, 'text': 'So what you can do is you can take, you can provide a set of, you can provide a comma separated list of indices or even ranges,', 'start': 11058.076, 'duration': 7.782}, {'end': 11070.079, 'text': 'and you can do use that to select a specific element from the entire area.', 'start': 11065.858, 'duration': 4.221}, {'end': 11073.22, 'text': "So let's take this area and you can select a specific element.", 'start': 11070.259, 'duration': 2.961}, {'end': 11073.98, 'text': "Let's say this one.", 'start': 11073.32, 'duration': 0.66}, {'end': 11081.685, 'text': 'Or you can select specific portions, right? You can even select just what are called slices or sub arrays from the NumPy array.', 'start': 11074.64, 'duration': 7.045}], 'summary': "Numpy extends python's indexing to multiple dimensions for intuitive selection.", 'duration': 29.77, 'max_score': 11051.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI11051915.jpg'}], 'start': 9968.416, 'title': 'Manipulating climate data with numpy', 'summary': 'Discusses the process of manipulating climate data using numpy by separating data points, performing matrix multiplication to predict yields, adding a new yield column, and writing the results back to a file. it also explains how to find the right function, demonstrating the concept of broadcasting and the use of comparison operators to count the number of equal elements in two arrays.', 'chapters': [{'end': 10039.173, 'start': 9968.416, 'title': 'Downloading and loading climate data', 'summary': 'Discusses the process of downloading a climate data file named climate.txt, checking its presence, and loading it using the np.genfromtxt function, which can handle about 10,000 data points.', 'duration': 70.757, 'highlights': ['The np.genfromtxt function is used to load the climate data file, which contains about 10,000 data points.', 'The process involves downloading the file named climate.txt and checking its presence in the current directory.', 'The climate.txt file contains a header describing each column and numerous data points.']}, {'end': 10418.833, 'start': 10039.793, 'title': 'Manipulating climate data with numpy', 'summary': 'Demonstrates how to manipulate climate data using numpy by separating data points, performing matrix multiplication to predict yields, adding a new yield column, and writing the results back to a file.', 'duration': 379.04, 'highlights': ['Performing matrix multiplication to predict yields for 10,000 data rows using a given set of weights (0.3 0.2 0.5).', 'Adding a new yield column to the climate data and writing the results back to a file using numpy.concatenate and numpy.savetxt.', 'Providing the delimiter and skipping the header row while reading and writing the data.']}, {'end': 11010.268, 'start': 10419.894, 'title': 'Numpy functions and operations', 'summary': 'Explores numpy functions for array operations, including arithmetic operations, broadcasting, and comparison operations, and explains how to find the right function, demonstrating the concept of broadcasting and the use of comparison operators to count the number of equal elements in two arrays.', 'duration': 590.374, 'highlights': ['NumPy provides hundreds of functions for performing operations on arrays, including mathematics, array manipulation, matrix multiplications, dot products, transpose eigenvalues, and statistical functions like mean, median, and standard deviation.', 'The chapter demonstrates how to find the right function by doing a web search and formulating the desired operation into a single line, along with the recommendation to explore the documentation and ask on the Jovian or ML forum for assistance.', 'Explains arithmetic operations on NumPy arrays, including element-wise operations, performing arithmetic operations with a single number, and demonstrates element-wise subtraction, division, multiplication, and remainder operations.', 'Introduces the concept of broadcasting in NumPy, allowing arithmetic operations between arrays with different dimensions but compatible shapes, with a visual explanation and a reminder that broadcasting only works if one array can be replicated to match the shape of the other array.', 'Discusses the use of comparison operations in NumPy arrays and demonstrates how to count the number of equal elements in two arrays using comparison operators.']}, {'end': 11504.019, 'start': 11010.268, 'title': 'Numpy array operations and indexing', 'summary': 'Covers array operations and indexing in numpy, including broadcasting, multi-dimensional indexing, and range selection, providing a comprehensive understanding of how to manipulate and extract data from numpy arrays.', 'duration': 493.751, 'highlights': ['NumPy array indexing allows for selection of specific elements and subarrays using intuitive multi-dimensional notation.', 'The chapter provides a detailed walkthrough of multi-dimensional indexing, demonstrating how to select specific elements and portions from a NumPy array.', 'The chapter explains the process of indexing using ranges, showcasing how to retrieve a range of elements along each dimension of a NumPy array.']}], 'duration': 1535.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI9968416.jpg', 'highlights': ['Performing matrix multiplication to predict yields for 10,000 data rows using a given set of weights (0.3 0.2 0.5).', 'The np.genfromtxt function is used to load the climate data file, which contains about 10,000 data points.', 'NumPy provides hundreds of functions for performing operations on arrays, including mathematics, array manipulation, matrix multiplications, dot products, transpose eigenvalues, and statistical functions like mean, median, and standard deviation.', 'Adding a new yield column to the climate data and writing the results back to a file using numpy.concatenate and numpy.savetxt.', 'The climate.txt file contains a header describing each column and numerous data points.', 'Explains arithmetic operations on NumPy arrays, including element-wise operations, performing arithmetic operations with a single number, and demonstrates element-wise subtraction, division, multiplication, and remainder operations.', 'Introduces the concept of broadcasting in NumPy, allowing arithmetic operations between arrays with different dimensions but compatible shapes, with a visual explanation and a reminder that broadcasting only works if one array can be replicated to match the shape of the other array.', 'The chapter demonstrates how to find the right function by doing a web search and formulating the desired operation into a single line, along with the recommendation to explore the documentation and ask on the Jovian or ML forum for assistance.', 'The process involves downloading the file named climate.txt and checking its presence in the current directory.', 'The chapter provides a detailed walkthrough of multi-dimensional indexing, demonstrating how to select specific elements and portions from a NumPy array.']}, {'end': 13118.431, 'segs': [{'end': 11902.18, 'src': 'embed', 'start': 11870.114, 'weight': 0, 'content': [{'end': 11874.097, 'text': "So that's that I'll just make a one final commit here.", 'start': 11870.114, 'duration': 3.983}, {'end': 11878.714, 'text': "And let's just do a quick recap of all the topics that we just covered.", 'start': 11875.373, 'duration': 3.341}, {'end': 11886.296, 'text': 'So for the first thing that we saw was how to go from Python lists to NumPy arrays and why we should do that, why we should consider it.', 'start': 11879.154, 'duration': 7.142}, {'end': 11889.077, 'text': 'We also looked at how to operate on NumPy arrays.', 'start': 11886.696, 'duration': 2.381}, {'end': 11892.477, 'text': 'So we looked at the NP dot dot function or the matrix multiplication.', 'start': 11889.117, 'duration': 3.36}, {'end': 11896.038, 'text': 'Then we looked at the benefits of using NumPy arrays over lists.', 'start': 11893.058, 'duration': 2.98}, {'end': 11902.18, 'text': 'And then we saw the, we saw how we can go beyond just one dimension using multi-dimensional NumPy arrays.', 'start': 11896.458, 'duration': 5.722}], 'summary': 'Covered python lists to numpy arrays, benefits, multi-dimensional arrays.', 'duration': 32.066, 'max_score': 11870.114, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI11870114.jpg'}, {'end': 12076.375, 'src': 'embed', 'start': 12044.685, 'weight': 1, 'content': [{'end': 12048.932, 'text': 'So if I come back to the course page, so that is zero to pandas.com.', 'start': 12044.685, 'duration': 4.247}, {'end': 12054.718, 'text': 'Now here on the course page, you can see that assignment two is now live.', 'start': 12050.595, 'duration': 4.123}, {'end': 12060.843, 'text': 'And the objective of this assignment is to have a build a solid understanding of some NumPy array operations.', 'start': 12055.439, 'duration': 5.404}, {'end': 12065.947, 'text': 'So what you need to do is you need to pick five interesting NumPy array functions.', 'start': 12060.883, 'duration': 5.064}, {'end': 12068.929, 'text': 'So you will do that by going through the documentation.', 'start': 12066.407, 'duration': 2.522}, {'end': 12076.375, 'text': 'So specifically by going through this page on routines, and then you will run and modify a starter notebook.', 'start': 12069.169, 'duration': 7.206}], 'summary': 'Assignment two on zero to pandas.com is live, focusing on understanding numpy array operations by picking five functions and modifying a starter notebook.', 'duration': 31.69, 'max_score': 12044.685, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI12044685.jpg'}, {'end': 12392.802, 'src': 'embed', 'start': 12364.474, 'weight': 2, 'content': [{'end': 12369.7, 'text': 'Okay So now you can put in the notebook link and this has to be a Jovian notebook link and click submit.', 'start': 12364.474, 'duration': 5.226}, {'end': 12374.805, 'text': 'And that is going to then, put it into evaluation and then we will do the evaluation.', 'start': 12370.46, 'duration': 4.345}, {'end': 12383.374, 'text': "Basically, what we're looking for is you should have five functions and you should have some explanation and three examples for each function to that work and one that breaks.", 'start': 12374.905, 'duration': 8.469}, {'end': 12385.056, 'text': 'And that is the whole idea here.', 'start': 12383.895, 'duration': 1.161}, {'end': 12392.802, 'text': "But after doing that, after you're done with the submission, there is an optional, but important part that you should also do.", 'start': 12386.3, 'duration': 6.502}], 'summary': 'Submit jovian notebook link with 5 functions, 3 examples each, and optional additional part.', 'duration': 28.328, 'max_score': 12364.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI12364474.jpg'}, {'end': 12617.235, 'src': 'embed', 'start': 12595.159, 'weight': 4, 'content': [{'end': 12604.566, 'text': 'So this is an optional exercise, but once again, if you want to really master a Python and master NumPy, we highly recommend doing these a hundred,', 'start': 12595.159, 'duration': 9.407}, {'end': 12607.028, 'text': 'or at least try and do maybe 20 or 30 of these.', 'start': 12604.566, 'duration': 2.462}, {'end': 12610.991, 'text': 'pick a random set of 20 and try and do these exercises.', 'start': 12607.028, 'duration': 3.963}, {'end': 12613.012, 'text': "They're also marked by a difficulty.", 'start': 12611.031, 'duration': 1.981}, {'end': 12614.433, 'text': "So that's up to you.", 'start': 12613.512, 'duration': 0.921}, {'end': 12617.235, 'text': 'And what you can do is.', 'start': 12615.274, 'duration': 1.961}], 'summary': 'Practice 20-30 optional python exercises to master numpy and improve skills.', 'duration': 22.076, 'max_score': 12595.159, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI12595159.jpg'}, {'end': 12702.662, 'src': 'embed', 'start': 12678.246, 'weight': 3, 'content': [{'end': 12686.115, 'text': 'So with that, we complete our discussion of NumPy and I want to cover another topic here since we have some time.', 'start': 12678.246, 'duration': 7.869}, {'end': 12687.837, 'text': 'So this is.', 'start': 12687.036, 'duration': 0.801}, {'end': 12694.015, 'text': "We've looked at how to use NumPy and we've also seen briefly how NumPy works with files.", 'start': 12689.071, 'duration': 4.944}, {'end': 12702.662, 'text': 'but it is also a useful thing to know how to work with files and the operating system interact with the operating system in general using pure Python.', 'start': 12694.015, 'duration': 8.647}], 'summary': 'Discussion on numpy completed, now moving on to working with files and the operating system using pure python.', 'duration': 24.416, 'max_score': 12678.246, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI12678246.jpg'}], 'start': 11504.459, 'title': 'Numpy array creation, understanding functions, and operating system in python', 'summary': 'Delves into methods for creating numpy arrays like np.dot.array, np.zeros, np.ones, np.random, np.arange, and np.linspace. it emphasizes understanding selected numpy array functions through experimentation and covers assignment submission, evaluation criteria, and optional numpy exercises. additionally, it discusses using numpy and working with files and the operating system in python.', 'chapters': [{'end': 12013.147, 'start': 11504.459, 'title': 'Numpy array creation and manipulation', 'summary': 'Covers various methods for creating numpy arrays, including np.dot.array, np.zeros, np.ones, np.random, np.arange, reshaping arrays, and np.linspace. it also provides an overview of the vast array of functions and routines offered by numpy, highlighting its utility and versatility.', 'duration': 508.688, 'highlights': ['The chapter covers various methods for creating NumPy arrays, including NP.dot.array, NP.zeros, NP.ones, NP.random, NP.arange, reshaping arrays, and NP.linspace.', 'It also provides an overview of the vast array of functions and routines offered by NumPy, highlighting its utility and versatility.']}, {'end': 12294.999, 'start': 12014.301, 'title': 'Understanding numpy functions', 'summary': 'Emphasizes the importance of experimenting with a few selected numpy array functions, as part of an assignment to build a solid understanding, and provide three examples for each function, including two working examples and one example that breaks, aiming to ensure a thorough comprehension of the functions.', 'duration': 280.698, 'highlights': ['The assignment requires picking five interesting NumPy array functions and providing three examples for each function, including two working examples and one example that breaks.', 'The importance of experimenting with the functions and coming up with unique examples that were not found in the documentation is emphasized.', 'The need to write a short introduction about NumPy, list the chosen functions, and provide a basic explanation about each function before giving three examples is outlined.', 'The requirement to save the notebook and list the chosen functions inside a code cell is mentioned.', "The assignment instructs to edit the information for each function, add an explanation about the function in one's own words, experiment with it, and give three examples, including one that breaks."]}, {'end': 12657.321, 'start': 12294.999, 'title': 'Assignment submission and optional numpy exercises', 'summary': 'Covers the process of submitting an assignment on jovian.com, including saving the notebook to the jovian profile, making a submission, and the evaluation criteria, along with a recommendation to engage in an optional exercise of 100 numpy problems for mastering python and numpy.', 'duration': 362.322, 'highlights': ['The process of submitting an assignment on jovian.com involves saving the notebook to the Jovian profile and making a submission on the assignment page, followed by evaluation criteria of five functions with three examples for each function to work and one that breaks.', 'A recommendation is made for engaging in an optional exercise of 100 NumPy problems to master Python and NumPy, with a suggestion to try at least 20 or 30 of these exercises, marked by difficulty, and to seek help from a dedicated forum thread.']}, {'end': 13118.431, 'start': 12657.721, 'title': 'Using numpy and working with operating system in python', 'summary': 'Covers the exercises, required assignment, and optional numpy exercises, and then discusses using numpy and working with files and the operating system in python, demonstrating how to check the current working directory, list files, create directories, download files, and read file contents.', 'duration': 460.71, 'highlights': ["The chapter covers the exercises, required assignment, and optional NumPy exercises, providing enough content for at least a week's worth of work.", 'Demonstrates how to work with files and the operating system in Python, including checking the current working directory, listing files, creating directories, and downloading files.', 'Shows how to read the contents of a file using the open function and a file object, including using the read method to obtain the entire contents of the file as a string.']}], 'duration': 1613.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI11504459.jpg', 'highlights': ['The chapter covers various methods for creating NumPy arrays, including NP.dot.array, NP.zeros, NP.ones, NP.random, NP.arange, reshaping arrays, and NP.linspace.', 'The assignment requires picking five interesting NumPy array functions and providing three examples for each function, including two working examples and one example that breaks.', 'The process of submitting an assignment on jovian.com involves saving the notebook to the Jovian profile and making a submission on the assignment page, followed by evaluation criteria of five functions with three examples for each function to work and one that breaks.', 'Demonstrates how to work with files and the operating system in Python, including checking the current working directory, listing files, creating directories, and downloading files.', 'A recommendation is made for engaging in an optional exercise of 100 NumPy problems to master Python and NumPy, with a suggestion to try at least 20 or 30 of these exercises, marked by difficulty, and to seek help from a dedicated forum thread.']}, {'end': 15387.672, 'segs': [{'end': 13208.676, 'src': 'embed', 'start': 13180.225, 'weight': 4, 'content': [{'end': 13183.767, 'text': 'So now we get that the IO operation on the closed file is not valid.', 'start': 13180.225, 'duration': 3.542}, {'end': 13189.531, 'text': "Okay So let's now try and process the CSV file.", 'start': 13184.868, 'duration': 4.663}, {'end': 13191.172, 'text': 'Okay but before we do that.', 'start': 13189.711, 'duration': 1.461}, {'end': 13194.313, 'text': 'Because we always need to open a file.', 'start': 13191.972, 'duration': 2.341}, {'end': 13197.653, 'text': 'So whenever we open a file, we always need to close it as well.', 'start': 13194.373, 'duration': 3.28}, {'end': 13200.614, 'text': 'So the Python has some special syntax for it.', 'start': 13198.053, 'duration': 2.561}, {'end': 13202.554, 'text': 'So there is something called a width statement.', 'start': 13200.974, 'duration': 1.58}, {'end': 13208.676, 'text': 'Now this width statement is used for a lot of things, but one most common use case is with open.', 'start': 13203.155, 'duration': 5.521}], 'summary': "Python has special syntax for opening and closing files, using 'with open' statement.", 'duration': 28.451, 'max_score': 13180.225, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI13180225.jpg'}, {'end': 14149.254, 'src': 'embed', 'start': 14105.326, 'weight': 2, 'content': [{'end': 14107.327, 'text': 'But this is just, I wanted to show you something different.', 'start': 14105.326, 'duration': 2.001}, {'end': 14116.913, 'text': 'Okay, Now, one interesting thing to notice here is that this read CSV function that we have defined.', 'start': 14108.448, 'duration': 8.465}, {'end': 14125.879, 'text': 'this is actually generic enough that this can pass any CSV format not NASA, not just this specific format of a home loans, right?', 'start': 14116.913, 'duration': 8.966}, {'end': 14131.843, 'text': 'So it can have a CSV file with any number of rows and any number of columns.', 'start': 14126.299, 'duration': 5.544}, {'end': 14136.266, 'text': 'And it can also handle missing values in just about 15, 20 lines of code.', 'start': 14132.503, 'duration': 3.763}, {'end': 14139.488, 'text': "We've written a pretty powerful function that can be pretty helpful.", 'start': 14136.306, 'duration': 3.182}, {'end': 14143.67, 'text': 'And what you should do is over time, you should start building a repository of your own functions.', 'start': 14139.588, 'duration': 4.082}, {'end': 14149.254, 'text': 'Maybe you should just keep a Python file somewhere on your GitHub or just on your computer, where you keep,', 'start': 14144.071, 'duration': 5.183}], 'summary': 'The read csv function can handle any csv format, including handling missing values, in about 15-20 lines of code.', 'duration': 43.928, 'max_score': 14105.326, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI14105326.jpg'}, {'end': 14201.088, 'src': 'embed', 'start': 14171.329, 'weight': 0, 'content': [{'end': 14178.752, 'text': 'where we defined a function called loan EMI that takes the amount for a loan, the duration, the rate of interest and a down payment.', 'start': 14171.329, 'duration': 7.423}, {'end': 14186.437, 'text': 'And it returns, it performs a calculation and returns the equal monthly installment for the repayment of the loan.', 'start': 14179.392, 'duration': 7.045}, {'end': 14190.56, 'text': "And this we've covered in a lot of detail in the previous lecture.", 'start': 14186.737, 'duration': 3.823}, {'end': 14193.722, 'text': 'So please refer to lesson two and refer to the part on functions.', 'start': 14190.6, 'duration': 3.122}, {'end': 14201.088, 'text': 'Okay So what we want to do now is we have this function that operates on a single loan, and then we have a list of loans.', 'start': 14194.983, 'duration': 6.105}], 'summary': "Defined function 'loan emi' calculates monthly installment for loan repayment.", 'duration': 29.759, 'max_score': 14171.329, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI14171329.jpg'}, {'end': 14842.947, 'src': 'embed', 'start': 14816.171, 'weight': 1, 'content': [{'end': 14823.52, 'text': "as the instructions might vary slightly based on your operating system and the version of Python that you're using,", 'start': 14816.171, 'duration': 7.349}, {'end': 14826.023, 'text': 'but you should be able to follow along with this quite easily.', 'start': 14823.52, 'duration': 2.503}, {'end': 14831.963, 'text': 'Okay So pandas is typically used for working with tabular data.', 'start': 14827.222, 'duration': 4.741}, {'end': 14838.025, 'text': 'And when I say tabulated data, you can think of the data stored in a spreadsheet or in a database table.', 'start': 14832.584, 'duration': 5.441}, {'end': 14842.947, 'text': 'And the first thing that you might want to do is to actually read a data file.', 'start': 14838.545, 'duration': 4.402}], 'summary': 'Learn to use pandas for tabular data manipulation in python.', 'duration': 26.776, 'max_score': 14816.171, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI14816171.jpg'}], 'start': 13118.952, 'title': 'Handling csv data in python', 'summary': "Explains how to read and process csv files in python, detailing steps like handling missing values, converting data into dictionaries. it also covers processing italy's covid data, calculating emis, and working with files in python.", 'chapters': [{'end': 13576.157, 'start': 13118.952, 'title': 'Reading csv file and processing data', 'summary': 'Explains how to read and process csv files in python, emphasizing the importance of closing files to free up system memory and detailing the steps to read file contents, handle missing values, and convert data into a list of dictionaries for easier manipulation.', 'duration': 457.205, 'highlights': ['The chapter explains how to read and process CSV files in Python, emphasizing the importance of closing files to free up system memory.', 'Detailing the steps to read file contents, handle missing values, and convert data into a list of dictionaries for easier manipulation.', 'Explaining the process of parsing the file into a list of dictionaries, providing a more user-friendly and efficient data structure for further analysis.']}, {'end': 13851.611, 'start': 13577.148, 'title': 'Csv data processing', 'summary': 'Explains the step-by-step process of reading and processing csv data, including defining functions to parse headers and values, handling edge cases, and identifying and resolving errors, exemplified through specific code examples and explanations.', 'duration': 274.463, 'highlights': ['Defining functions to parse headers and values', 'Identifying and resolving errors, including handling edge cases', 'Step-by-step process of reading and processing CSV data']}, {'end': 14149.254, 'start': 13851.811, 'title': 'Parsing and handling csv data with python', 'summary': 'Covers parsing and handling csv data, including stripping, splitting, handling missing values, converting to float, creating dictionaries of data, and creating a generic read csv function in python, which can handle any csv format and missing values in about 15-20 lines of code.', 'duration': 297.443, 'highlights': ['The read CSV function can parse any CSV format, not just the specific format of home loans, and handle missing values in about 15-20 lines of code.', 'Creating a generic read CSV function in Python, which can handle any CSV format and missing values in about 15-20 lines of code.', 'Parsing and handling CSV data, including stripping, splitting, handling missing values, and converting to float.', 'Creating dictionaries of data by converting values and headers together into a dictionary.', 'Iterating through the data line, stripping, and splitting, and appending the value zero for empty strings.']}, {'end': 14440.617, 'start': 14149.254, 'title': 'Calculate emis and write to file', 'summary': 'Covers the process of calculating emis for a list of loans using a defined function and then writing the loan information along with the emis back to a file, demonstrating the implementation of the logic and the creation of a function to write to a csv file.', 'duration': 291.363, 'highlights': ["The function 'loan EMI' is defined to calculate the equal monthly installment for the repayment of a loan based on the loan amount, duration, rate of interest, and down payment.", 'A function is created to compute EMIs for all the loans in a list, adding the calculated EMIs to the existing loan information, and a separate function is defined to write the loan details along with the EMIs back to a file.', 'The process demonstrates the implementation of logic to calculate EMIs for a list of loans and the creation of a function to write the loan information along with the calculated EMIs back to a file.']}, {'end': 14902.784, 'start': 14441.638, 'title': 'Working with files in python', 'summary': 'Demonstrates working with files in python, including reading and writing files using python, creating functions for file operations, and using python for processing large amounts of data, with a focus on analyzing tabular data with pandas.', 'duration': 461.146, 'highlights': ['Pandas is typically used for working with tabular data, providing helper functions to read data from various file formats like CSVs, Excel, spreadsheets, HTML, JSON, and more.', 'The chapter emphasizes the importance of creating functions for file operations and using Python for processing large amounts of data.', 'The transcript covers the process of reading and writing files using Python, iterating through files, computing EMIs, and writing the results back to new files.']}, {'end': 15387.672, 'start': 14903.925, 'title': "Downloading and analyzing italy's covid data", 'summary': "Covers the process of downloading and analyzing italy's covid data, including the format of the file, reading the data into a data frame using pandas, and obtaining basic statistical information such as mean, standard deviation, and maximum values for new cases, deaths, and tests.", 'duration': 483.747, 'highlights': ['Italy COVID data file downloaded using URL retrieve function from URL lib dot request module', 'Data read and stored in a data frame using pandas', 'Basic statistical information obtained using the describe method']}], 'duration': 2268.72, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI13118952.jpg', 'highlights': ["The function 'loan EMI' is defined to calculate the equal monthly installment for the repayment of a loan based on the loan amount, duration, rate of interest, and down payment.", 'Pandas is typically used for working with tabular data, providing helper functions to read data from various file formats like CSVs, Excel, spreadsheets, HTML, JSON, and more.', 'Creating a generic read CSV function in Python, which can handle any CSV format and missing values in about 15-20 lines of code.', 'The read CSV function can parse any CSV format, not just the specific format of home loans, and handle missing values in about 15-20 lines of code.', 'The chapter explains how to read and process CSV files in Python, emphasizing the importance of closing files to free up system memory.']}, {'end': 17206.293, 'segs': [{'end': 15441.518, 'src': 'embed', 'start': 15411.961, 'weight': 3, 'content': [{'end': 15413.122, 'text': "Let's say the 4th of April.", 'start': 15411.961, 'duration': 1.161}, {'end': 15418.314, 'text': 'Right? So a specific value within a specific column and a row.', 'start': 15414.352, 'duration': 3.962}, {'end': 15424.358, 'text': 'Now to do this, it might help to understand what the internal representation of data in a data frame looks like.', 'start': 15418.775, 'duration': 5.583}, {'end': 15430.329, 'text': 'So conceptually you can think of a data frame as a dictionary of lists.', 'start': 15425.365, 'duration': 4.964}, {'end': 15436.874, 'text': 'So you can think of a data frame as having a structure like this that the COVID data, a COVID DF data frame,', 'start': 15430.789, 'duration': 6.085}, {'end': 15441.518, 'text': 'is similar in structure to the to this COVID data dict, which is a dictionary.', 'start': 15436.874, 'duration': 4.644}], 'summary': 'Understanding data frames as dictionaries of lists for covid data', 'duration': 29.557, 'max_score': 15411.961, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI15411961.jpg'}, {'end': 15644.344, 'src': 'embed', 'start': 15616.755, 'weight': 4, 'content': [{'end': 15622.604, 'text': 'So we have been able to retrieve a full list of values in, in a particular column.', 'start': 15616.755, 'duration': 5.849}, {'end': 15626.152, 'text': 'Okay, And now each column is actually.', 'start': 15623.55, 'duration': 2.602}, {'end': 15634.017, 'text': "it's not really a list or an array, but each column is represented using a data structure called a series, which is essentially a numpy array,", 'start': 15626.152, 'duration': 7.865}, {'end': 15636.959, 'text': 'but it has some extra methods and properties associated with it.', 'start': 15634.017, 'duration': 2.942}, {'end': 15644.344, 'text': 'So if you check the type of COVID DF new cases, you will find that it has the type series and, just like arrays,', 'start': 15637.479, 'duration': 6.865}], 'summary': 'Retrieved full list of values in a column using a series data structure.', 'duration': 27.589, 'max_score': 15616.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI15616755.jpg'}, {'end': 16442.404, 'src': 'embed', 'start': 16411.485, 'weight': 1, 'content': [{'end': 16416.167, 'text': "So that's the data we have on Italy so far till about the first week of September.", 'start': 16411.485, 'duration': 4.682}, {'end': 16423.35, 'text': 'We might want to know, okay, what is the overall reported death rate, which is the ratio of reported deaths to reported cases.', 'start': 16417.128, 'duration': 6.222}, {'end': 16429.774, 'text': 'So that is simply the sum, the total number of deaths reported by the total number of cases.', 'start': 16423.731, 'duration': 6.043}, {'end': 16434.642, 'text': 'And you can see that is about 13%.', 'start': 16430.514, 'duration': 4.128}, {'end': 16442.404, 'text': 'Now that does not mean that 13% of people who contract the virus are going to suffer or die from it.', 'start': 16434.642, 'duration': 7.762}], 'summary': "Italy's reported death rate is about 13% based on data till the first week of september.", 'duration': 30.919, 'max_score': 16411.485, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI16411485.jpg'}, {'end': 16544.639, 'src': 'embed', 'start': 16518.147, 'weight': 0, 'content': [{'end': 16523.529, 'text': 'Then we have, we might want to find out what fraction of tests returned a positive result.', 'start': 16518.147, 'duration': 5.382}, {'end': 16528.432, 'text': 'So it turns out that the positive rate of total cases by total test that was about 5.21%.', 'start': 16523.95, 'duration': 4.482}, {'end': 16531.233, 'text': 'Right So about 5.21% of the tests in Italy led to a positive diagnosis.', 'start': 16528.432, 'duration': 2.801}, {'end': 16540.203, 'text': 'Now this value may actually have varied month from month, and we are only looking at the overall value here.', 'start': 16534.955, 'duration': 5.248}, {'end': 16544.639, 'text': 'So this is how we can answer some basic questions about our data.', 'start': 16541.437, 'duration': 3.202}], 'summary': '5.21% of tests in italy led to a positive diagnosis.', 'duration': 26.492, 'max_score': 16518.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI16518147.jpg'}], 'start': 15387.852, 'title': 'Data analysis with pandas', 'summary': 'Explores retrieving, accessing, and analyzing data in a pandas data frame, including the representation format, accessing methods, nan values, insights into datasets, covid-19 data analysis in italy, and querying and sorting data, offering quantifiable details and methods for efficient analysis.', 'chapters': [{'end': 15682.8, 'start': 15387.852, 'title': 'Retrieving data from pandas data frame', 'summary': 'Explores retrieving data from a pandas data frame by understanding its internal structure as a dictionary of lists, the benefits of the representation format, and the efficient methods of accessing and retrieving specific values from the data frame.', 'duration': 294.948, 'highlights': ['The internal representation of a data frame is similar to a dictionary of lists, with keys as column headers and values as lists containing the column values, providing an efficient way to store and retrieve data.', 'Retrieving values for a particular row involves fetching the respective elements from each column, which is a very efficient operation, especially when the data types are consistent.', 'The representation of data in the Pandas data frame is more compact compared to other formats, such as a list of dictionaries, making it more space-efficient and performant.', 'Accessing a key using the indexing notation in the Pandas data frame retrieves all the values in a column, and each column is represented as a series, which is essentially a numpy array with additional methods and properties.', 'Specific values within the series can be retrieved using the indexing notation, allowing for efficient access to individual values within the data frame.']}, {'end': 16025.2, 'start': 15683.1, 'title': 'Accessing data in pandas', 'summary': 'Explains how to access specific elements in a data frame using methods like at, dot notation, and loc, and the implications of modifying data frames, with a focus on pandas optimization and the head and tail methods.', 'duration': 342.1, 'highlights': ["The at method in Pandas can be used to directly retrieve the value at a specific row and column, providing a more convenient way to access data (e.g., COVID.at[2, 'new cases'] returns 975).", 'Accessing columns as properties of the data frame using the dot notation in Pandas provides a more natural and efficient way to access data, including auto-completion of column names (e.g., COVID_DF.new cases).', "Pandas allows accessing a subset of a data frame by passing in a list of columns into the indexing notation, resulting in a smaller data frame with just those columns (e.g., COVID_DF[['date', 'new cases']]).", 'Pandas optimization allows creating new data frames and generating new columns without repeatedly copying the data, contributing to faster processing of large datasets.', 'The LOC method in Pandas can be used to access a specific row of data in a data frame, with the ability to retrieve multiple rows using the head and tail methods for viewing the first or last few rows of the data.']}, {'end': 16330.424, 'start': 16025.22, 'title': 'Accessing and analyzing data in pandas', 'summary': 'Discusses how nan values are used to represent missing data in a csv file, and provides insights into the dataset, including the start date of daily test reporting in italy and methods for accessing and analyzing data in a pandas dataframe.', 'duration': 305.204, 'highlights': ['The NaN values in the dataset represent that the daily test numbers were not reported on specific dates, with Italy starting to report daily tests after April 19th, 2020, and 935,310 tests conducted before that date.', "The first valid index for the 'new tests' column is 111, and it is important to check the values before and after this index to verify the transition from NaN to actual numbers.", 'The sample method can be used to retrieve a random sample of rows from the dataframe, providing a visual understanding of the data and retaining the original indices of the sampled rows.', 'Various methods for accessing and retrieving data from a pandas dataframe are discussed, including retrieving columns, getting values using indexing notation, accessing specific rows using LOC method, creating a copy of a dataframe, and using methods like head, tail, and sample to retrieve multiple rows of data.']}, {'end': 16540.203, 'start': 16330.444, 'title': 'Covid-19 data analysis in italy', 'summary': 'Explores covid-19 data analysis in italy, including the total reported cases and deaths, the reported death rate, and the overall number of tests conducted, highlighting the subtle details and providing quantifiable data.', 'duration': 209.759, 'highlights': ['The total number of reported cases in Italy is 271,515, with a total of 35,497 reported deaths related to COVID-19, as of the first week of September.', 'The overall reported death rate is approximately 13%, indicating the ratio of reported deaths to reported cases.', 'The total number of tests conducted in Italy is about 5.2 million, with a positive rate of approximately 5.21%.']}, {'end': 17206.293, 'start': 16541.437, 'title': 'Data analysis with pandas', 'summary': 'Covers querying and sorting data using pandas, including filtering rows based on criteria such as new cases, displaying all rows, creating a new column based on division of existing columns, and the importance of considering external context when analyzing the data.', 'duration': 664.856, 'highlights': ['Filtering rows based on criteria such as new cases', 'Creating a new column based on division of existing columns', 'Importance of considering external context when analyzing the data']}], 'duration': 1818.441, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI15387852.jpg', 'highlights': ['The total number of tests conducted in Italy is about 5.2 million, with a positive rate of approximately 5.21%', 'The overall reported death rate is approximately 13%, indicating the ratio of reported deaths to reported cases', 'The total number of reported cases in Italy is 271,515, with a total of 35,497 reported deaths related to COVID-19, as of the first week of September', 'The internal representation of a data frame is similar to a dictionary of lists, with keys as column headers and values as lists containing the column values, providing an efficient way to store and retrieve data', 'Accessing a key using the indexing notation in the Pandas data frame retrieves all the values in a column, and each column is represented as a series, which is essentially a numpy array with additional methods and properties']}, {'end': 18580.282, 'segs': [{'end': 17276.045, 'src': 'embed', 'start': 17229.623, 'weight': 0, 'content': [{'end': 17233.787, 'text': 'So it seems like the 22nd of March was the peak with 6, 557 reported cases that day.', 'start': 17229.623, 'duration': 4.164}, {'end': 17241.234, 'text': 'And then, and it was mostly the last couple of weeks of March.', 'start': 17237.273, 'duration': 3.961}, {'end': 17245.536, 'text': 'It seems like where most of the highest number of cases were reported.', 'start': 17241.294, 'duration': 4.242}, {'end': 17250.297, 'text': "Okay Now let's compare this with the days where the highest number of deaths were recorded.", 'start': 17245.976, 'duration': 4.321}, {'end': 17254.819, 'text': 'So here now we have COVID DF dot sort values, and here we are sorting my new debts.', 'start': 17250.718, 'duration': 4.101}, {'end': 17261.561, 'text': "And once again, we're looking at the top 10 and it seems like the 28th of March was when there was the peak and then the 29th.", 'start': 17254.899, 'duration': 6.662}, {'end': 17265.643, 'text': 'So if you compare these two, it seems like overall that.', 'start': 17262.142, 'duration': 3.501}, {'end': 17271.444, 'text': 'The daily deaths hit a peak just about a week after the peak in the daily new cases.', 'start': 17266.483, 'duration': 4.961}, {'end': 17276.045, 'text': 'So that might lead you to, and then you can dig in a little more and read some literature.', 'start': 17271.764, 'duration': 4.281}], 'summary': 'Peak of 6,557 reported cases on 22nd march. deaths peaked about a week after cases.', 'duration': 46.422, 'max_score': 17229.623, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI17229623.jpg'}, {'end': 17340.751, 'src': 'embed', 'start': 17318.001, 'weight': 2, 'content': [{'end': 17325.81, 'text': "So as you might expect, we see a lot of days from the last year of December, and then we see a bunch of days in January, but here's one.", 'start': 17318.001, 'duration': 7.809}, {'end': 17329.214, 'text': 'So on the 20th of June, there seems to be the value minus 148.', 'start': 17325.97, 'duration': 3.244}, {'end': 17333.399, 'text': 'And now this is unexpected.', 'start': 17329.214, 'duration': 4.185}, {'end': 17336.543, 'text': 'You might not expect to see a negative number of cases in a day.', 'start': 17333.619, 'duration': 2.924}, {'end': 17339.831, 'text': 'But that is the nature of real data.', 'start': 17337.83, 'duration': 2.001}, {'end': 17340.751, 'text': 'This is what happens.', 'start': 17339.991, 'duration': 0.76}], 'summary': 'Unusual data point: on june 20th, there were -148 cases, highlighting the unpredictability of real data.', 'duration': 22.75, 'max_score': 17318.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI17318001.jpg'}, {'end': 17892.364, 'src': 'heatmap', 'start': 17533.371, 'weight': 0.741, 'content': [{'end': 17538.297, 'text': 'So we say COVID DF dot at, and so one 72 is the row number or the row index.', 'start': 17533.371, 'duration': 4.926}, {'end': 17540.119, 'text': 'And then new cases is the column.', 'start': 17538.497, 'duration': 1.622}, {'end': 17542.422, 'text': 'And here we simply say that we want to.', 'start': 17540.64, 'duration': 1.782}, {'end': 17547.906, 'text': 'Take the average of the two values of the value above and below it in the data frame.', 'start': 17543.383, 'duration': 4.523}, {'end': 17549.307, 'text': 'All right.', 'start': 17549.007, 'duration': 0.3}, {'end': 17556.291, 'text': 'And with that, we have fixed or at least temporarily fixed the discrepancy in the data frame.', 'start': 17550.027, 'duration': 6.264}, {'end': 17556.812, 'text': 'All right.', 'start': 17556.572, 'duration': 0.24}, {'end': 17565.536, 'text': 'So these are just some ways in which we can filter out data and sort data, sort rows out of the data frame.', 'start': 17557.967, 'duration': 7.569}, {'end': 17569.441, 'text': 'So we looked at finding the sum of values in a column or a series.', 'start': 17565.556, 'duration': 3.885}, {'end': 17573.285, 'text': 'We looked at querying a subset of rows, satisfying a given criteria.', 'start': 17569.561, 'duration': 3.724}, {'end': 17577.249, 'text': 'We looked at adding new columns by combining data from existing columns.', 'start': 17573.866, 'duration': 3.383}, {'end': 17581.572, 'text': 'And then we also looked at removing one or more columns from the data frame using drop.', 'start': 17577.669, 'duration': 3.903}, {'end': 17584.974, 'text': 'And then we looked at sorting the rows of data using the column values.', 'start': 17581.932, 'duration': 3.042}, {'end': 17588.517, 'text': 'And finally we looked at replacing a value within a data frame.', 'start': 17585.214, 'duration': 3.303}, {'end': 17592.8, 'text': 'And once again, this is not something that you need to remember by heart.', 'start': 17589.157, 'duration': 3.643}, {'end': 17599.125, 'text': 'On the other hand, you need not, and you can simply search for this online and you will be able to find the right resource.', 'start': 17593.741, 'duration': 5.384}, {'end': 17602.517, 'text': "Okay So now I've committed my notebook once again.", 'start': 17600.296, 'duration': 2.221}, {'end': 17607.741, 'text': 'So moving forward, as I just mentioned, this data is ordered by date.', 'start': 17602.978, 'duration': 4.763}, {'end': 17610.142, 'text': 'So we have, this is what is called time series data.', 'start': 17607.941, 'duration': 2.201}, {'end': 17614.465, 'text': 'So, while we have looked at the overall number for cases and tests and positive rates,', 'start': 17610.222, 'duration': 4.243}, {'end': 17621.61, 'text': 'it might also be useful to study these numbers on a month by month basis, because the because there is so much variation on the daily cases.', 'start': 17614.465, 'duration': 7.145}, {'end': 17626.033, 'text': 'So you might want to just drill down and look at a specific month or a specific week.', 'start': 17621.95, 'duration': 4.083}, {'end': 17629.293, 'text': 'Now to do that, the date column might come in handy here.', 'start': 17626.673, 'duration': 2.62}, {'end': 17637.315, 'text': 'And in fact, because so much data tends to have a date or a time associated with it, pandas provides many utilities for working with dates.', 'start': 17629.814, 'duration': 7.501}, {'end': 17637.915, 'text': 'All right.', 'start': 17637.635, 'duration': 0.28}, {'end': 17645.256, 'text': 'So if you just check the date column here, so it seems like this is a series as we might expect, but this has the D type or data type of object.', 'start': 17638.115, 'duration': 7.141}, {'end': 17649.817, 'text': 'And that means that pandas currently does not know that this column is a date.', 'start': 17645.776, 'duration': 4.041}, {'end': 17655.078, 'text': 'So what we might want to first do is convert this date time, convert this date column into.', 'start': 17650.137, 'duration': 4.941}, {'end': 17662.026, 'text': 'a new column, which has the data type of date time, and this can be done using PD dot two date time.', 'start': 17656.059, 'duration': 5.967}, {'end': 17670.457, 'text': 'Okay, So we just say PD dot two date time and then we give it a series and it takes the series and then it converts those data types into the date time format.', 'start': 17662.247, 'duration': 8.21}, {'end': 17673.275, 'text': "So it's an internal format within pandas.", 'start': 17671.434, 'duration': 1.841}, {'end': 17677.138, 'text': 'And then what we can do is we can take that and we can assign it to a new column.', 'start': 17673.916, 'duration': 3.222}, {'end': 17684.563, 'text': 'But instead of assigning it to a new column, we can simply assign it back to the same column, because once you have the date in the date time format,', 'start': 17677.999, 'duration': 6.564}, {'end': 17686.484, 'text': "then we don't really need the string date.", 'start': 17684.563, 'duration': 1.921}, {'end': 17689.787, 'text': 'We can always get it out from the date time format if we need it.', 'start': 17686.724, 'duration': 3.063}, {'end': 17695.851, 'text': 'So now if we check COVIDDF.date or COVIDDF indexed the date column.', 'start': 17690.307, 'duration': 5.544}, {'end': 17702.358, 'text': 'So you can see here that the data looks similar, but this time the date, the data type is date time 64.', 'start': 17696.491, 'duration': 5.867}, {'end': 17707.784, 'text': 'So this tracks day, this tracks time up to a particular nanosecond.', 'start': 17702.358, 'duration': 5.426}, {'end': 17716.535, 'text': "But since there was no time portion in the date, so you don't really see a time portion, but it is, it tracks time up to nanoseconds.", 'start': 17708.806, 'duration': 7.729}, {'end': 17726.282, 'text': 'Okay So we can now, what we can do is we can extract different parts of the data into separate columns using the date, time index class.', 'start': 17717.697, 'duration': 8.585}, {'end': 17728.644, 'text': 'So this is a really helpful class in pandas.', 'start': 17726.583, 'duration': 2.061}, {'end': 17735.228, 'text': 'So all you need to do is take the series or the column and pass it into PD or date, time index.', 'start': 17728.664, 'duration': 6.564}, {'end': 17737.75, 'text': 'So that creates an object of type date, time index.', 'start': 17735.548, 'duration': 2.202}, {'end': 17744.094, 'text': 'Now, if you want to extract a specific information, a piece of information out of it, you can do that.', 'start': 17738.51, 'duration': 5.584}, {'end': 17748.816, 'text': 'So if you want to get back the year, so we just do a dot here and then we can set,', 'start': 17744.474, 'duration': 4.342}, {'end': 17751.778, 'text': 'and that will return a series and we can set that into the year column.', 'start': 17748.816, 'duration': 2.962}, {'end': 17753.299, 'text': 'Similarly, we can set it.', 'start': 17752.239, 'duration': 1.06}, {'end': 17757.302, 'text': 'We can set a month column and we can set a day and a weekday column as well.', 'start': 17753.459, 'duration': 3.843}, {'end': 17758.742, 'text': 'All right.', 'start': 17758.462, 'duration': 0.28}, {'end': 17761.563, 'text': "So now we've just added four columns,", 'start': 17759.322, 'duration': 2.241}, {'end': 17768.586, 'text': 'simply by converting the data date column into a date time data type and then passing it into the date time index.', 'start': 17761.563, 'duration': 7.023}, {'end': 17771.307, 'text': "Okay And let's look at the data frame now.", 'start': 17768.786, 'duration': 2.521}, {'end': 17776.449, 'text': 'So we have the date as before we have, uh, the metrics that we had earlier, but now you can see your.', 'start': 17771.507, 'duration': 4.942}, {'end': 17778.951, 'text': 'Month day, and even the weekday.', 'start': 17776.989, 'duration': 1.962}, {'end': 17782.413, 'text': 'So the months go from one to 12, January to December.', 'start': 17778.991, 'duration': 3.422}, {'end': 17786.516, 'text': 'The day goes from one to 31 or whatever is the number of days in the month.', 'start': 17782.573, 'duration': 3.943}, {'end': 17789.778, 'text': 'And the weekday represents Monday to Sunday.', 'start': 17786.916, 'duration': 2.862}, {'end': 17792.72, 'text': 'So Monday is zero and Sunday is six.', 'start': 17790.099, 'duration': 2.621}, {'end': 17794.582, 'text': 'And then you can guess the values in between.', 'start': 17792.821, 'duration': 1.761}, {'end': 17795.863, 'text': 'All right.', 'start': 17795.603, 'duration': 0.26}, {'end': 17798.485, 'text': 'So now that we have the month related information.', 'start': 17796.283, 'duration': 2.202}, {'end': 17802.628, 'text': 'So let us track the overall metrics for the month of me and.', 'start': 17799.165, 'duration': 3.463}, {'end': 17807.616, 'text': 'For this, we may, we are actually going to do it in a few steps.', 'start': 17803.894, 'duration': 3.722}, {'end': 17810.677, 'text': 'So let me, let us just see all the intermediate steps here.', 'start': 17808.156, 'duration': 2.521}, {'end': 17813.778, 'text': 'First of all, we may want to query the rows for me.', 'start': 17811.557, 'duration': 2.221}, {'end': 17816.66, 'text': "So here what we're saying is COVID DF.", 'start': 17814.779, 'duration': 1.881}, {'end': 17826.004, 'text': 'me is a COVID DF, and then we are indexing into it and passing a Boolean expression where the month column is equal to five,', 'start': 17816.66, 'duration': 9.344}, {'end': 17827.164, 'text': "and let's check the value of it.", 'start': 17826.004, 'duration': 1.16}, {'end': 17829.926, 'text': 'So there you go about.', 'start': 17828.325, 'duration': 1.601}, {'end': 17836.481, 'text': 'It looks like there are, yeah, so there are 31 values or yeah, there are 31 values here and you can count it as well.', 'start': 17830.811, 'duration': 5.67}, {'end': 17841.129, 'text': 'So there are 31 values for, and this contains the data only for me.', 'start': 17837.062, 'duration': 4.067}, {'end': 17842.571, 'text': 'Now the next step now.', 'start': 17841.63, 'duration': 0.941}, {'end': 17848.396, 'text': 'is to get this, get the sum for each column, right? So we want to sum for new cases.', 'start': 17843.653, 'duration': 4.743}, {'end': 17853.398, 'text': 'We want to sum for new debts and we want to sum for new tests, but instead of doing it column by column,', 'start': 17848.436, 'duration': 4.962}, {'end': 17856.96, 'text': 'we can also do it all at once on the entire data frame.', 'start': 17853.398, 'duration': 3.562}, {'end': 17863.884, 'text': 'But to do it all at once, we may want to first exclude the columns which we do not want to sum over,', 'start': 17857.441, 'duration': 6.443}, {'end': 17866.605, 'text': 'because some of for some of these columns the sum may not make sense.', 'start': 17863.884, 'duration': 2.721}, {'end': 17869.267, 'text': 'So we can simply extract just these three columns.', 'start': 17866.966, 'duration': 2.301}, {'end': 17871.088, 'text': "So let's do that.", 'start': 17870.447, 'duration': 0.641}, {'end': 17881.999, 'text': 'So we say we create another data frame called COVID-DF-May matrix, where we pass COVID-DF-May and from it we simply extract the columns new cases,', 'start': 17872.455, 'duration': 9.544}, {'end': 17883.24, 'text': 'new deaths and new tests.', 'start': 17881.999, 'duration': 1.241}, {'end': 17889.263, 'text': 'So here we are using the indexing notation and instead of giving it a single column, we are giving it a list of columns.', 'start': 17884.02, 'duration': 5.243}, {'end': 17892.364, 'text': 'And now you can check the value of COVID-DF-May matrix.', 'start': 17889.883, 'duration': 2.481}], 'summary': "Data frame manipulation: filtering, sorting, adding, converting date column, and extracting date-related information. calculating sum for specific month's data.", 'duration': 358.993, 'max_score': 17533.371, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI17533371.jpg'}, {'end': 17677.138, 'src': 'embed', 'start': 17650.137, 'weight': 3, 'content': [{'end': 17655.078, 'text': 'So what we might want to first do is convert this date time, convert this date column into.', 'start': 17650.137, 'duration': 4.941}, {'end': 17662.026, 'text': 'a new column, which has the data type of date time, and this can be done using PD dot two date time.', 'start': 17656.059, 'duration': 5.967}, {'end': 17670.457, 'text': 'Okay, So we just say PD dot two date time and then we give it a series and it takes the series and then it converts those data types into the date time format.', 'start': 17662.247, 'duration': 8.21}, {'end': 17673.275, 'text': "So it's an internal format within pandas.", 'start': 17671.434, 'duration': 1.841}, {'end': 17677.138, 'text': 'And then what we can do is we can take that and we can assign it to a new column.', 'start': 17673.916, 'duration': 3.222}], 'summary': 'Convert date column to datetime format using pd.to_datetime.', 'duration': 27.001, 'max_score': 17650.137, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI17650137.jpg'}, {'end': 18144.281, 'src': 'embed', 'start': 18115.905, 'weight': 5, 'content': [{'end': 18119.947, 'text': 'And we wanted to summarize it and we want to summarize it on a month wise level.', 'start': 18115.905, 'duration': 4.042}, {'end': 18129.911, 'text': 'So instead of having the new cases, new tests and new debts on a daily basis, we might want to see them on a weekly basis or on a monthly basis.', 'start': 18120.327, 'duration': 9.584}, {'end': 18132.753, 'text': 'And this is again, a very common use case often.', 'start': 18129.991, 'duration': 2.762}, {'end': 18136.295, 'text': 'the data, and this is called the granularity of data.', 'start': 18133.433, 'duration': 2.862}, {'end': 18144.281, 'text': 'So often you may have data that is collected per millisecond, or you may have data that is collected per second or per minute or per hour per day.', 'start': 18136.655, 'duration': 7.626}], 'summary': 'Summarize covid-19 data on monthly basis for better analysis.', 'duration': 28.376, 'max_score': 18115.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18115905.jpg'}, {'end': 18371.141, 'src': 'embed', 'start': 18343.94, 'weight': 4, 'content': [{'end': 18347.742, 'text': 'And then we had selected three columns, new cases, new deaths and new tests.', 'start': 18343.94, 'duration': 3.802}, {'end': 18353.807, 'text': 'So now for each of these groups, we, so these are the three columns that show up in the result.', 'start': 18348.263, 'duration': 5.544}, {'end': 18358.21, 'text': 'And for each of these groups, we have then performed an aggregation using some.', 'start': 18354.347, 'duration': 3.863}, {'end': 18362.634, 'text': "So that's where you can see that the total number of cases reported in January was three,", 'start': 18358.65, 'duration': 3.984}, {'end': 18368.679, 'text': 'and then it grew to 885 and then it grew to a hundred thousand and it remained at a hundred thousand in April.', 'start': 18362.634, 'duration': 6.045}, {'end': 18371.141, 'text': 'And then it started going down in the month of May.', 'start': 18368.739, 'duration': 2.402}], 'summary': 'Data analysis shows new cases grew from 3 to 100,000 in april, then declined in may.', 'duration': 27.201, 'max_score': 18343.94, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18343940.jpg'}, {'end': 18409.829, 'src': 'embed', 'start': 18382.971, 'weight': 6, 'content': [{'end': 18388.415, 'text': 'And already you can see here that simply by grouping and aggregating a data onto a monthly basis,', 'start': 18382.971, 'duration': 5.444}, {'end': 18394.06, 'text': 'we can already make some more inferences than we were able to do previously using just the date level data.', 'start': 18388.415, 'duration': 5.645}, {'end': 18396.982, 'text': 'And that is really the whole purpose of data analysis.', 'start': 18394.46, 'duration': 2.522}, {'end': 18402.726, 'text': 'What we are doing is we are transforming data from one form into another, whether it is aggregating it, whether it is.', 'start': 18397.082, 'duration': 5.644}, {'end': 18409.829, 'text': 'creating new columns, whether it is creating new groups or whether it is sorting in a certain way.', 'start': 18403.467, 'duration': 6.362}], 'summary': 'Grouping and aggregating data monthly enhances data analysis.', 'duration': 26.858, 'max_score': 18382.971, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18382971.jpg'}, {'end': 18543.526, 'src': 'embed', 'start': 18513.868, 'weight': 7, 'content': [{'end': 18517.091, 'text': 'So there are a lot of ways to aggregate, but we are aggregating by mean.', 'start': 18513.868, 'duration': 3.223}, {'end': 18519.293, 'text': 'And when we do that, we can see here that.', 'start': 18517.572, 'duration': 1.721}, {'end': 18524.997, 'text': 'For each weekday, we can see how many new cases on average were found.', 'start': 18520.114, 'duration': 4.883}, {'end': 18530.54, 'text': 'So on average, 1100 cases on Mondays and 1200 cases on Sundays.', 'start': 18525.077, 'duration': 5.463}, {'end': 18534.962, 'text': 'And it seems like on Tuesday and Wednesdays, when on average you had the lowest number of cases.', 'start': 18530.64, 'duration': 4.322}, {'end': 18536.343, 'text': 'All right.', 'start': 18536.102, 'duration': 0.241}, {'end': 18539.464, 'text': 'And then you might want to now dig in and investigate that.', 'start': 18536.723, 'duration': 2.741}, {'end': 18543.526, 'text': 'Why is this the case? Maybe it has something to do with the testing methodology.', 'start': 18539.844, 'duration': 3.682}], 'summary': 'Aggregated by mean, average 1100 cases on mondays, 1200 on sundays. fewer cases on tuesdays and wednesdays. investigate testing methodology.', 'duration': 29.658, 'max_score': 18513.868, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18513868.jpg'}], 'start': 17207.526, 'title': 'Data analysis in pandas', 'summary': 'Covers data manipulation using pandas for covid-19 cases and deaths, handling time series data, summarizing data granularity, and analyzing data aggregation to gain insights such as lowest average cases reported on specific weekdays and correlation between tests and reported cases.', 'chapters': [{'end': 17592.8, 'start': 17207.526, 'title': 'Data analysis with pandas', 'summary': 'Covers data manipulation using pandas, highlighting the analysis of covid-19 cases and deaths, including the peak dates and discrepancies in the data, along with strategies for dealing with faulty values.', 'duration': 385.274, 'highlights': ['The peak date for new COVID-19 cases was 22nd of March with 6,557 reported cases, followed by the last weeks of March with the highest number of cases.', 'The peak date for COVID-19 deaths was 28th of March, occurring about a week after the peak in new cases, indicating a potential timeline for severe illness and recovery.', 'Identifying and addressing discrepancies in the data, such as a negative value for new cases on 20th of June, by considering potential data entry errors and strategies for handling faulty values.']}, {'end': 18071.308, 'start': 17593.741, 'title': 'Handling time series data in pandas', 'summary': 'Explains how to handle time series data in pandas, including converting date columns to datetime format, extracting specific date information, and analyzing metrics on a month by month basis, with examples such as summing metrics for a specific month and comparing average cases reported on sundays to the overall average.', 'duration': 477.567, 'highlights': ['Converting date columns into datetime format using PD.to_datetime and extracting specific date information by passing it into the DateTimeIndex class, adding columns for year, month, day, and weekday.', 'Summing metrics for a specific month by querying rows based on the month and extracting the sum for new cases, new deaths, and new tests, both individually and all at once on the entire data frame.', 'Comparing average cases reported on Sundays to the overall average by calculating the overall average number of cases per day and the average number of cases on Sundays.']}, {'end': 18382.371, 'start': 18071.788, 'title': 'Summarizing covid-19 data granularity', 'summary': 'Discusses the importance of investigating data, changing data granularity to summarize covid-19 data on a monthly basis, using the group by function to aggregate data, and identifying potential trends in the data, such as a potential second wave of covid-19 cases in august.', 'duration': 310.583, 'highlights': ['Explaining the concept of changing data granularity to summarize COVID-19 data on a monthly basis', 'Demonstrating the use of the group by function to aggregate COVID-19 data', 'Identifying potential trends in the COVID-19 data, such as a potential second wave of cases in August', 'Emphasizing the importance of investigating data and sharing interesting insights']}, {'end': 18580.282, 'start': 18382.971, 'title': 'Data analysis and aggregation', 'summary': 'Discusses the process of transforming data through grouping and aggregating to gain more inferences, along with using mean aggregation to analyze average number of cases reported on each weekday, with an emphasis on the lowest average cases reported on tuesday and wednesday, and the correlation between average number of tests and reported cases.', 'duration': 197.311, 'highlights': ['The purpose of data analysis is to transform data from one form into another, such as aggregating, creating new columns, creating new groups, or sorting, to gather more inferences about the data.', 'The chapter introduces the use of mean aggregation to find the average number of cases reported on every weekday, with the lowest average cases on Tuesday and Wednesday, and the highest average tests on Fridays.', 'The correlation between the average number of tests and reported cases is discussed, suggesting that it took about two days for the test results to come out.']}], 'duration': 1372.756, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI17207526.jpg', 'highlights': ['The peak date for new COVID-19 cases was 22nd of March with 6,557 reported cases', 'The peak date for COVID-19 deaths was 28th of March, occurring about a week after the peak in new cases', 'Identifying and addressing discrepancies in the data, such as a negative value for new cases on 20th of June', 'Converting date columns into datetime format using PD.to_datetime and extracting specific date information', 'Summing metrics for a specific month by querying rows based on the month and extracting the sum for new cases, new deaths, and new tests', 'Explaining the concept of changing data granularity to summarize COVID-19 data on a monthly basis', 'The purpose of data analysis is to transform data from one form into another, such as aggregating, creating new columns, creating new groups, or sorting', 'The chapter introduces the use of mean aggregation to find the average number of cases reported on every weekday', 'The correlation between the average number of tests and reported cases is discussed']}, {'end': 20235.832, 'segs': [{'end': 18652.707, 'src': 'embed', 'start': 18627.714, 'weight': 1, 'content': [{'end': 18635.081, 'text': "So let's just call cum sum on new cases and set that to set that into a new column, total cases.", 'start': 18627.714, 'duration': 7.367}, {'end': 18638.204, 'text': 'And similarly, we do that for the deaths and the tests as well.', 'start': 18635.541, 'duration': 2.663}, {'end': 18644.962, 'text': "And now what we've done is by the way, for tests, we have also included the initial test count.", 'start': 18639.298, 'duration': 5.664}, {'end': 18651.126, 'text': 'So this is again to account for, there were a lot of date where daily tests were not reported.', 'start': 18645.742, 'duration': 5.384}, {'end': 18652.707, 'text': 'So we have accounted for that as well.', 'start': 18651.166, 'duration': 1.541}], 'summary': 'Applied cumulative sum to new cases, deaths, and tests, accounting for unreported daily tests.', 'duration': 24.993, 'max_score': 18627.714, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18627714.jpg'}, {'end': 18733.917, 'src': 'embed', 'start': 18700.413, 'weight': 2, 'content': [{'end': 18703.335, 'text': 'you might need to merge data from multiple sources.', 'start': 18700.413, 'duration': 2.922}, {'end': 18709.82, 'text': "So you might want to calculate, let's say we want to calculate a metric like tests per million or cases per million.", 'start': 18703.835, 'duration': 5.985}, {'end': 18713.303, 'text': 'And so for this, we require some more information about the country.', 'start': 18710.34, 'duration': 2.963}, {'end': 18716.312, 'text': "Specifically it's population in this case.", 'start': 18714.392, 'duration': 1.92}, {'end': 18724.615, 'text': "Okay, So let's download another file, locations.csv, which contains the health related information for different countries around the world,", 'start': 18716.733, 'duration': 7.882}, {'end': 18725.355, 'text': 'including Italy.', 'start': 18724.615, 'duration': 0.74}, {'end': 18727.475, 'text': 'Okay So we have URL retrieve here.', 'start': 18725.755, 'duration': 1.72}, {'end': 18733.917, 'text': "We, and this is where we're passing in the full URL and we are passing in a CSV file name where we want to download this data.", 'start': 18727.495, 'duration': 6.422}], 'summary': 'Merge data from multiple sources to calculate metrics like tests per million or cases per million, requiring population information. download health-related information for different countries including italy from locations.csv.', 'duration': 33.504, 'max_score': 18700.413, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18700413.jpg'}, {'end': 19487.75, 'src': 'embed', 'start': 19455.181, 'weight': 3, 'content': [{'end': 19458.322, 'text': 'And then to combine them, either you have to create a new table.', 'start': 19455.181, 'duration': 3.141}, {'end': 19460.303, 'text': 'Now pandas lets you avoid all that.', 'start': 19458.342, 'duration': 1.961}, {'end': 19469.127, 'text': 'You can pull data from any format into a pandas data frame which is held in memory and then do all and combine them and then perform all these operations.', 'start': 19460.383, 'duration': 8.744}, {'end': 19470.047, 'text': "So that's one benefit.", 'start': 19469.147, 'duration': 0.9}, {'end': 19474.72, 'text': 'Then the other thing is that SQL is a very limited language.', 'start': 19471.277, 'duration': 3.443}, {'end': 19481.905, 'text': 'So Python being a general purpose language, you can write functions, you can write, you can reuse the code.', 'start': 19475.4, 'duration': 6.505}, {'end': 19487.75, 'text': 'You can use a pandas library, and then there is a whole ecosystem of thousands of libraries that you can use,', 'start': 19483.046, 'duration': 4.704}], 'summary': 'Pandas simplifies data manipulation, enables code reusability, and offers access to a vast library ecosystem.', 'duration': 32.569, 'max_score': 19455.181, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI19455181.jpg'}, {'end': 19563.283, 'src': 'embed', 'start': 19534.98, 'weight': 0, 'content': [{'end': 19539.023, 'text': 'but now we want to do this similar analysis for a hundred or 200 other countries.', 'start': 19534.98, 'duration': 4.043}, {'end': 19544.807, 'text': 'Now at that point, it might get a little difficult to do that in Excel because you cannot really.', 'start': 19539.483, 'duration': 5.324}, {'end': 19552.634, 'text': 'You cannot really capture all of the analysis that we did into some basic Excel formula or even like an Excel micro.', 'start': 19545.888, 'duration': 6.746}, {'end': 19555.636, 'text': 'So that is where again, pandas, this might be useful.', 'start': 19553.114, 'duration': 2.522}, {'end': 19563.283, 'text': 'You can take all the analysis we have done, put that into a function and use that function to analyze hundreds of files.', 'start': 19555.676, 'duration': 7.607}], 'summary': 'Use pandas to analyze data for hundreds of countries efficiently.', 'duration': 28.303, 'max_score': 19534.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI19534980.jpg'}, {'end': 19606.873, 'src': 'embed', 'start': 19577.352, 'weight': 4, 'content': [{'end': 19580.494, 'text': 'You can draw graphs within a Jupiter notebook and it all.', 'start': 19577.352, 'duration': 3.142}, {'end': 19583.195, 'text': 'it has this sequential order, which is so.', 'start': 19580.494, 'duration': 2.701}, {'end': 19591.42, 'text': "it's also very well suited for a narrative or a storytelling flow, where you can present all of your findings in a structured format and.", 'start': 19583.195, 'duration': 8.225}, {'end': 19594.46, 'text': 'You know, there are ways where you can hide the code and so on.', 'start': 19592.398, 'duration': 2.062}, {'end': 19598.484, 'text': 'So I would say that Jupyter is actually a good way to present.', 'start': 19594.981, 'duration': 3.503}, {'end': 19602.749, 'text': 'And what you can do is you can simply take the Jovian URL and you can share that.', 'start': 19599.005, 'duration': 3.744}, {'end': 19606.873, 'text': 'And that is a good way to just present, create a report and share it.', 'start': 19603.169, 'duration': 3.704}], 'summary': 'Jupyter notebook allows for structured presentation of findings and easy sharing via jovian url.', 'duration': 29.521, 'max_score': 19577.352, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI19577352.jpg'}], 'start': 18581.383, 'title': 'Utilizing pandas and jupyter for data analysis', 'summary': 'Covers cumulative aggregation methods for daily covid-19 data, merging data from different sources, calculating metrics like cases per million and tests per million, writing data from pandas to csv, and the benefits of using pandas and jupyter for scalable, narrative-driven data analysis.', 'chapters': [{'end': 18700.413, 'start': 18581.383, 'title': 'Cumulative aggregation in data analysis', 'summary': 'Covers the use of cumulative methods like cum sum to calculate running total of new cases, tests, and deaths on a daily basis, including accounting for nan values, providing a comprehensive view of the dataset.', 'duration': 119.03, 'highlights': ['The chapter covers the use of cumulative methods like cum sum to calculate running total of new cases, tests, and deaths on a daily basis, including accounting for NAN values, providing a comprehensive view of the dataset.', 'The cumulative method allows accumulation of data on a row by row level, including cumulative maximum and cumulative mean, offering various ways to analyze trends in the dataset.', 'The total cases, total deaths, and total tests are calculated by applying cum sum on new cases, deaths, and tests respectively, providing a complete overview of the dataset.', 'NAN values in the cumulative values remain as NAN until a single NAN value is encountered, impacting the cumulative result, and the total number of tests includes the initial test count to account for unreported daily tests.']}, {'end': 19058.716, 'start': 18700.413, 'title': 'Merging and calculating metrics', 'summary': "Explains how to merge data from different sources using python's merge operation, and calculate metrics like cases per million and tests per million, emphasizing the importance of per capita comparison in analyzing country data.", 'duration': 358.303, 'highlights': ["The chapter explains how to merge data from different sources using Python's merge operation, and calculate metrics like cases per million and tests per million.", 'Emphasizes the importance of per capita comparison in analyzing country data.']}, {'end': 19512.517, 'start': 19059.077, 'title': 'Writing data from pandas to csv', 'summary': 'Covers the process of saving data from a pandas data frame to a csv file, emphasizing the creation of a smaller data frame, using the to_csv method, and the benefits of using pandas over sql for data analysis.', 'duration': 453.44, 'highlights': ['Pandas allows the creation of a smaller data frame to filter out unnecessary columns, reducing redundancy and optimizing the data for writing to a file.', 'The to_csv method is used to save the data frame to a CSV file, with the option to specify the file name and handle the index inclusion in the output.', 'Pandas offers the advantage of flexibility and ease in combining data from multiple sources into a data frame, enabling efficient data analysis without the limitations of SQL.']}, {'end': 20235.832, 'start': 19513.057, 'title': 'Benefits of pandas and jupyter for data analysis', 'summary': 'Discusses the benefits of using pandas and jupyter for data analysis, including the ability to scale analysis for multiple countries, create narrative flow for presentations, and visualize trends in covid-19 data, demonstrating the advantages of programming in python for data analysis.', 'duration': 722.775, 'highlights': ['Pandas enables scaling analysis for multiple countries by using functions to analyze hundreds of files, demonstrating the advantage of using pandas over Excel for large-scale data analysis.', 'Jupyter notebook provides a structured format for presentations, allowing the addition of explanations, graphs, and narrative flow, making it a better tool for creating and sharing reports compared to traditional methods like Excel.', 'Plotting data using pandas and Jupyter enables visualization and analysis of trends, such as the number of COVID-19 cases and deaths on a day-to-day basis, providing valuable insights for understanding trends and making informed decisions.']}], 'duration': 1654.449, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI18581383.jpg', 'highlights': ['Pandas enables scaling analysis for multiple countries using functions to analyze hundreds of files, demonstrating the advantage of using pandas over Excel for large-scale data analysis.', 'The chapter covers the use of cumulative methods like cum sum to calculate running total of new cases, tests, and deaths on a daily basis, including accounting for NAN values, providing a comprehensive view of the dataset.', "The chapter explains how to merge data from different sources using Python's merge operation, and calculate metrics like cases per million and tests per million.", 'Pandas offers the advantage of flexibility and ease in combining data from multiple sources into a data frame, enabling efficient data analysis without the limitations of SQL.', 'Jupyter notebook provides a structured format for presentations, allowing the addition of explanations, graphs, and narrative flow, making it a better tool for creating and sharing reports compared to traditional methods like Excel.']}, {'end': 22612.537, 'segs': [{'end': 20324.969, 'src': 'embed', 'start': 20298.448, 'weight': 1, 'content': [{'end': 20304.074, 'text': 'And the objective of this assignment is essentially to gain some hands-on experience with the pandas library.', 'start': 20298.448, 'duration': 5.626}, {'end': 20307.659, 'text': 'So the things that we learned today, you will get to apply them.', 'start': 20305.076, 'duration': 2.583}, {'end': 20315.327, 'text': 'So things like creating data frames, querying and indexing operations, grouping, merging and aggregation, and dealing with missing values.', 'start': 20308.059, 'duration': 7.268}, {'end': 20318.671, 'text': 'And to get started, you can simply open up the starter notebook.', 'start': 20315.988, 'duration': 2.683}, {'end': 20324.969, 'text': 'So here is a starter notebook and it contains the information that you need to work on the assignment.', 'start': 20319.844, 'duration': 5.125}], 'summary': 'Gain hands-on experience with pandas library including data frames, querying, indexing, grouping, merging, aggregation, and handling missing values.', 'duration': 26.521, 'max_score': 20298.448, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI20298448.jpg'}, {'end': 20750.163, 'src': 'embed', 'start': 20723.041, 'weight': 0, 'content': [{'end': 20726.544, 'text': 'And then the second part is to merge this data with some COVID-19 stats.', 'start': 20723.041, 'duration': 3.503}, {'end': 20730.327, 'text': 'So once again, there is this file called COVID-19 countries data.', 'start': 20726.964, 'duration': 3.363}, {'end': 20734.57, 'text': 'So here for each country, we have the total cases, total deaths and total tests.', 'start': 20730.687, 'duration': 3.883}, {'end': 20737.553, 'text': 'And a lot of countries, the total tests are actually not reported.', 'start': 20734.791, 'duration': 2.762}, {'end': 20738.774, 'text': "So that's where you see NANS.", 'start': 20737.593, 'duration': 1.181}, {'end': 20740.875, 'text': "And that's what the first question is.", 'start': 20739.494, 'duration': 1.381}, {'end': 20744.578, 'text': 'Count the number of countries where the total tests data is missing.', 'start': 20741.376, 'duration': 3.202}, {'end': 20748.001, 'text': 'So just use the is any method to do this.', 'start': 20745.299, 'duration': 2.702}, {'end': 20750.163, 'text': 'That is a hint here for you.', 'start': 20749.082, 'duration': 1.081}], 'summary': 'Merge data with covid-19 stats and count countries with missing total tests.', 'duration': 27.122, 'max_score': 20723.041, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI20723041.jpg'}, {'end': 20966.774, 'src': 'embed', 'start': 20942.134, 'weight': 2, 'content': [{'end': 20947.619, 'text': '20 countries with lowest number of hospital beds, and then count the number of countries which feature in both lists.', 'start': 20942.134, 'duration': 5.485}, {'end': 20954.124, 'text': 'And then you can see if GDP has what kind of a relation GDP has with healthcare facilities in a country.', 'start': 20947.699, 'duration': 6.425}, {'end': 20963.19, 'text': 'Okay And once you do that, once you make this, once you solve each question, just keep committing.', 'start': 20955.262, 'duration': 7.928}, {'end': 20966.774, 'text': 'And then when you are ready to submit, just take the link.', 'start': 20963.57, 'duration': 3.204}], 'summary': "Analyze 20 countries with lowest hospital beds and their gdp's relation to healthcare facilities.", 'duration': 24.64, 'max_score': 20942.134, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI20942134.jpg'}, {'end': 21364.61, 'src': 'embed', 'start': 21337.471, 'weight': 3, 'content': [{'end': 21344.717, 'text': 'what we are going to look at is some popular data visualization techniques and we understand how to implement them using Python libraries,', 'start': 21337.471, 'duration': 7.246}, {'end': 21346.038, 'text': 'Matplotlib and Seaborn.', 'start': 21344.717, 'duration': 1.321}, {'end': 21349.02, 'text': "So to begin with let's import the libraries.", 'start': 21346.838, 'duration': 2.182}, {'end': 21352.021, 'text': "So we'll import matplotlib.pyplot.", 'start': 21349.42, 'duration': 2.601}, {'end': 21355.183, 'text': 'That is the module that is used for doing most of the basic plotting.', 'start': 21352.101, 'duration': 3.082}, {'end': 21364.61, 'text': "So we'll import matplotlib.pyplot with an alias of PLT, because that is the most common alias that is used in the data science domain.", 'start': 21355.584, 'duration': 9.026}], 'summary': 'Popular data visualization techniques using python libraries matplotlib and seaborn are demonstrated.', 'duration': 27.139, 'max_score': 21337.471, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI21337471.jpg'}, {'end': 22192.814, 'src': 'heatmap', 'start': 21823.385, 'weight': 0.792, 'content': [{'end': 21831.591, 'text': "You may get, you may form a hypothesis that it's probable that maybe the demand for oranges is falling or the demand for apples is growing.", 'start': 21823.385, 'duration': 8.206}, {'end': 21837.436, 'text': 'So maybe some of the land that was being used for apples is now being used to grow oranges or something like that.', 'start': 21831.691, 'duration': 5.745}, {'end': 21843.881, 'text': 'And once you have a hypothesis like that, then you can go back and investigate and find out if this is indeed the case.', 'start': 21837.636, 'duration': 6.245}, {'end': 21846.283, 'text': 'This is great, but.', 'start': 21844.902, 'duration': 1.381}, {'end': 21850.92, 'text': "Just by looking at this graph, it's not quite clear which line represents what.", 'start': 21847.458, 'duration': 3.462}, {'end': 21856.823, 'text': "So we already knew this was apples and that's how we could tell, but you can actually include what is called a legend.", 'start': 21851.06, 'duration': 5.763}, {'end': 21865.088, 'text': 'So you can simply say plt.legend and into the legend, you can give a list of labels for each of the lines.', 'start': 21857.103, 'duration': 7.985}, {'end': 21870.951, 'text': 'So the labels will get applied to each line and then you have, and then you have the title.', 'start': 21865.548, 'duration': 5.403}, {'end': 21874.733, 'text': 'This title is applied to the entire chart as a whole.', 'start': 21872.091, 'duration': 2.642}, {'end': 21881.87, 'text': 'So now you see here, you can see a title crop yields in Canto at the top of the chart, and then you have the apples and oranges.', 'start': 21875.834, 'duration': 6.036}, {'end': 21886.429, 'text': 'And then you have the yield per yield in tons per hectare.', 'start': 21883.208, 'duration': 3.221}, {'end': 21894.811, 'text': 'Now, as we add more lines, sometimes it can get difficult to understand where the points exactly are.', 'start': 21887.149, 'duration': 7.662}, {'end': 21900.513, 'text': 'So for instance, if we want to do the data for 2005, we have to estimate it a little bit.', 'start': 21894.851, 'duration': 5.662}, {'end': 21902.734, 'text': 'So we can tell that 2005 is around here.', 'start': 21900.553, 'duration': 2.181}, {'end': 21905.395, 'text': 'And so the value is probably this one.', 'start': 21902.954, 'duration': 2.441}, {'end': 21907.855, 'text': 'And that is about 0.908 or something like that.', 'start': 21905.855, 'duration': 2}, {'end': 21916.799, 'text': "It's not very clear sometimes, especially in a graph like this, which is seems to be growing linearly where the points exactly lie.", 'start': 21910.516, 'duration': 6.283}, {'end': 21919.92, 'text': 'And this is where you can actually add markers to the points.', 'start': 21917.259, 'duration': 2.661}, {'end': 21930.565, 'text': 'So to show markers for the data points we can simply use the marker argument of PLT, dot plot and matplotlit supports many different kinds of markers,', 'start': 21920.501, 'duration': 10.064}, {'end': 21934.287, 'text': 'like circles, cross square, diamonds and so on.', 'start': 21930.565, 'duration': 3.722}, {'end': 21936.528, 'text': 'And you can actually see a full list of markers here.', 'start': 21934.407, 'duration': 2.121}, {'end': 21941.553, 'text': 'So if you visit this page, you can, or you just search for matplotlib markers.', 'start': 21937.711, 'duration': 3.842}, {'end': 21947.096, 'text': 'You can see what the value that you need to provide to the marker argument and the symbol it displays on the screen.', 'start': 21941.993, 'duration': 5.103}, {'end': 21951.413, 'text': "Okay So let's try out a couple of examples.", 'start': 21947.116, 'duration': 4.297}, {'end': 21957.117, 'text': 'Now we are going to plot the yield of apples and we are going to use a circular marker for it.', 'start': 21951.733, 'duration': 5.384}, {'end': 21960.12, 'text': 'So apart from the line, there will be a circular marker.', 'start': 21957.678, 'duration': 2.442}, {'end': 21966.906, 'text': 'And then for the yield of oranges, we are going to use an X marker or a cross marker.', 'start': 21960.24, 'duration': 6.666}, {'end': 21968.407, 'text': "So let's run this.", 'start': 21967.706, 'duration': 0.701}, {'end': 21976.353, 'text': "So you can see now you have circular markers here, and then you have a cross markers here, and they're also represented in the legend.", 'start': 21969.708, 'duration': 6.645}, {'end': 21982.001, 'text': 'So this is a little better, but you might want to style these lines and markers.', 'start': 21977.455, 'duration': 4.546}, {'end': 21982.602, 'text': 'Suppose you want to.', 'start': 21982.021, 'duration': 0.581}, {'end': 21988.645, 'text': 'Uh, use these in a presentation, then you might want to include some of your brand colors and backgrounds and things like that.', 'start': 21983.542, 'duration': 5.103}, {'end': 21997.67, 'text': 'So to do the styling, uh, to style the lines in the markers, PLT dot plot, the plot function supports many arguments for styling them.', 'start': 21989.065, 'duration': 8.605}, {'end': 22001.672, 'text': 'So you can use this color or C argument to set the color of the line.', 'start': 21998.11, 'duration': 3.562}, {'end': 22008.196, 'text': 'And once again, there are many inbuilt colors that are supported within matplotlib.', 'start': 22002.152, 'duration': 6.044}, {'end': 22011.618, 'text': 'So here you can just search for list of colors in matplotlib.', 'start': 22008.616, 'duration': 3.002}, {'end': 22015.744, 'text': 'And you can see here that there are a lot of colors already supported.', 'start': 22012.398, 'duration': 3.346}, {'end': 22024.08, 'text': 'And if you want to use a color other than this, then you can use the RGB hex code of that specific color to set us a custom color as well.', 'start': 22016.386, 'duration': 7.694}, {'end': 22030.581, 'text': 'So you can set the color and you can set the line style so you can have a solid line or you can have a dashed line.', 'start': 22025.236, 'duration': 5.345}, {'end': 22035.426, 'text': 'Maybe you have one primary metric that you want to show using a solid line and the rest, you want to make them dashed.', 'start': 22030.621, 'duration': 4.805}, {'end': 22036.146, 'text': 'You can do that.', 'start': 22035.566, 'duration': 0.58}, {'end': 22041.331, 'text': 'And then you can set the width of the line in each lines, which can be set separately.', 'start': 22036.807, 'duration': 4.524}, {'end': 22049.119, 'text': 'You can set the size of the markers and you can modify some other, some colors of the markers as well, like the edge color, which is.', 'start': 22041.431, 'duration': 7.688}, {'end': 22056.082, 'text': 'The outline of the marker, the edge width, the width of the outline and the face color, which is the color of the filling in the marker.', 'start': 22049.679, 'duration': 6.403}, {'end': 22061.884, 'text': 'If it has a filling and you can also change the opacity of that specific line plus markers.', 'start': 22056.222, 'duration': 5.662}, {'end': 22067.006, 'text': 'So check out the documentation of plt.plot to learn more about what you can do.', 'start': 22062.444, 'duration': 4.562}, {'end': 22069.427, 'text': "And here we've just included a few examples.", 'start': 22067.466, 'duration': 1.961}, {'end': 22072.949, 'text': 'So for apples, we are going to use this a square marker with.', 'start': 22069.507, 'duration': 3.442}, {'end': 22080.296, 'text': 'Then the color is going to be blue and it is going to be a solid line with a width of two and a marker size of eight.', 'start': 22073.869, 'duration': 6.427}, {'end': 22084.441, 'text': 'And then the marker edge and the mark color is going to be different as well.', 'start': 22080.677, 'duration': 3.764}, {'end': 22089.487, 'text': 'And then for oranges, we are going to use the red color and we are going to use a dashed line.', 'start': 22085.462, 'duration': 4.025}, {'end': 22094.132, 'text': "So you'll be used to hyphens to indicate a dashed line and then the line width and.", 'start': 22089.567, 'duration': 4.565}, {'end': 22097.515, 'text': 'Marker size, et cetera, will also be different.', 'start': 22095.533, 'duration': 1.982}, {'end': 22100.437, 'text': 'And we are also setting an opacity of 50%.', 'start': 22097.955, 'duration': 2.482}, {'end': 22102.319, 'text': 'So for the oranges.', 'start': 22100.437, 'duration': 1.882}, {'end': 22107.263, 'text': 'So once we run this now, we see that now we have the crop yields in Canto,', 'start': 22103.079, 'duration': 4.184}, {'end': 22113.648, 'text': 'but now you can see that the lines are a little bit wider and the lines have bigger markers.', 'start': 22107.263, 'duration': 6.385}, {'end': 22118.472, 'text': 'And then the orange one is a little less is a little less opaque.', 'start': 22113.788, 'duration': 4.684}, {'end': 22119.633, 'text': 'It has a opacity of 50%.', 'start': 22118.512, 'duration': 1.121}, {'end': 22120.033, 'text': 'Whereas the.', 'start': 22119.633, 'duration': 0.4}, {'end': 22123.416, 'text': 'blue line, which is apples is solid.', 'start': 22121.474, 'duration': 1.942}, {'end': 22126.299, 'text': 'So this is how we can modify the lines and the markers.', 'start': 22123.816, 'duration': 2.483}, {'end': 22133.586, 'text': "Now there's one shorthand here, because often the most common thing that you want to do is to specify the type of line,", 'start': 22127.04, 'duration': 6.546}, {'end': 22136.329, 'text': 'the type of marker and the color of the two things.', 'start': 22133.586, 'duration': 2.743}, {'end': 22143.155, 'text': 'So there is a FMT argument that you can pass into PLT dot plot, and that can specify the line style.', 'start': 22136.829, 'duration': 6.326}, {'end': 22151.259, 'text': 'So the way you specify it, you specify the marker first, then you specify the line style, and then you specify the color all within a single string.', 'start': 22144.116, 'duration': 7.143}, {'end': 22159.783, 'text': "So for example, here, what we're doing is we're saying years apples, and then FMT is the third argument that goes into PLT dot plot.", 'start': 22152.48, 'duration': 7.303}, {'end': 22163.105, 'text': 'So you do not need to call it as a named argument.', 'start': 22159.803, 'duration': 3.302}, {'end': 22165.606, 'text': 'So you can just pass in the third argument.', 'start': 22163.645, 'duration': 1.961}, {'end': 22169.16, 'text': 'So here we are saying that it will have a square marker.', 'start': 22166.536, 'duration': 2.624}, {'end': 22171.624, 'text': 'It will have a solid line and it will have a blue color.', 'start': 22169.32, 'duration': 2.304}, {'end': 22179.095, 'text': "Whereas oranges, we're saying that they will have a dashed line and it will have a circular marker and it will have a red color.", 'start': 22172.265, 'duration': 6.83}, {'end': 22187.55, 'text': "So you can use this shorthand from time to time when you're quickly drawing, drawing some graphs, just to differentiate between different lines.", 'start': 22180.185, 'duration': 7.365}, {'end': 22192.814, 'text': "And once again, you can see that we've still copied over the X label, Y label title and legend.", 'start': 22188.191, 'duration': 4.623}], 'summary': 'Analyzing crop yields in canto using matplotlib to visualize and style data points and lines.', 'duration': 369.429, 'max_score': 21823.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI21823385.jpg'}, {'end': 21947.096, 'src': 'embed', 'start': 21917.259, 'weight': 4, 'content': [{'end': 21919.92, 'text': 'And this is where you can actually add markers to the points.', 'start': 21917.259, 'duration': 2.661}, {'end': 21930.565, 'text': 'So to show markers for the data points we can simply use the marker argument of PLT, dot plot and matplotlit supports many different kinds of markers,', 'start': 21920.501, 'duration': 10.064}, {'end': 21934.287, 'text': 'like circles, cross square, diamonds and so on.', 'start': 21930.565, 'duration': 3.722}, {'end': 21936.528, 'text': 'And you can actually see a full list of markers here.', 'start': 21934.407, 'duration': 2.121}, {'end': 21941.553, 'text': 'So if you visit this page, you can, or you just search for matplotlib markers.', 'start': 21937.711, 'duration': 3.842}, {'end': 21947.096, 'text': 'You can see what the value that you need to provide to the marker argument and the symbol it displays on the screen.', 'start': 21941.993, 'duration': 5.103}], 'summary': 'Matplotlib supports various markers like circles, squares, diamonds, etc.', 'duration': 29.837, 'max_score': 21917.259, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI21917259.jpg'}, {'end': 22273.557, 'src': 'embed', 'start': 22248.806, 'weight': 5, 'content': [{'end': 22254.989, 'text': 'Okay, Now, if you want to change the size of the figure, if you sometimes you might,', 'start': 22248.806, 'duration': 6.183}, {'end': 22260.852, 'text': 'the figure might be too small and you might want to increase the height of it, or you might want to increase the width of it for that.', 'start': 22254.989, 'duration': 5.863}, {'end': 22264.234, 'text': 'You can use the PLT dot figure function to change its size.', 'start': 22261.132, 'duration': 3.102}, {'end': 22270.616, 'text': 'So you simply say PLT dot figure, and then you said fig size, and then you give a tuple.', 'start': 22265.494, 'duration': 5.122}, {'end': 22273.557, 'text': 'So you can experiment with this, what the tuple does.', 'start': 22271.076, 'duration': 2.481}], 'summary': 'Use plt.figure function to change figure size.', 'duration': 24.751, 'max_score': 22248.806, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI22248806.jpg'}], 'start': 20236.773, 'title': 'Data analysis and visualization', 'summary': 'Covers data analysis using pandas and manipulation tasks, including healthcare data and covid-19 stats. it also discusses data visualization using matplotlib and seaborn libraries, as well as customizing plot markers, lines, figure size, and style in python.', 'chapters': [{'end': 20561.463, 'start': 20236.773, 'title': 'Pandas data analysis assignment', 'summary': 'Covers the analysis and operation on health-related data from a csv file containing information for 210 countries, including exercises on pandas, a recommended book, and step-by-step guidance for the assignment, focusing on hands-on experience with the pandas library and related tasks like creating data frames, querying and indexing operations, grouping, merging and aggregation, and dealing with missing values.', 'duration': 324.69, 'highlights': ['The chapter covers the analysis and operation on health-related data from a CSV file containing information for 210 countries, including exercises on pandas, a recommended book, and step-by-step guidance for the assignment.', 'Focusing on hands-on experience with the pandas library and related tasks like creating data frames, querying and indexing operations, grouping, merging and aggregation, and dealing with missing values.', 'The assignment involves replacing question marks with appropriate values, expressions, or statements to ensure that the notebook runs end to end properly, along with additional guidelines about running code cells, creating new variables, adding new code, and saving work by running jovian.commit at regular intervals.']}, {'end': 20942.134, 'start': 20561.944, 'title': 'Data analysis with pandas', 'summary': 'Covers various data manipulation tasks including retrieving continents from a data frame, finding the total population, calculating the overall life expectancy, sorting and filtering countries based on population and gdp, and merging covid-19 stats to analyze tests, cases, and deaths per million people.', 'duration': 380.19, 'highlights': ['The chapter covers various data manipulation tasks including retrieving continents from a data frame, finding the total population, calculating the overall life expectancy, sorting and filtering countries based on population and GDP, and merging COVID-19 stats to analyze tests, cases, and deaths per million people.', "Count the number of countries where the total tests data is missing, and merge the country's data frame and the COVID data frame using the location column to calculate the tests per million cases per million and deaths per million.", 'Create a data frame containing 10 countries with the lowest GDP per capita among the countries which have a population greater than a hundred million.', "Create a data frame containing 10 countries with the highest population and add a column into the country's data frame to record the overall GDP per country.", 'Retrieve a list of continents from the data frame using pandas methods, find the total population across all the countries listed in the dataset, and create a data frame that counts the number of countries in each continent.']}, {'end': 21545.12, 'start': 20942.134, 'title': 'Healthcare facilities and data visualization', 'summary': 'Discusses the relationship between gdp and healthcare facilities, the process of analyzing covid data for a specific country, and the basics of data visualization using matplotlib and seaborn libraries.', 'duration': 602.986, 'highlights': ['The chapter emphasizes on analyzing the relationship between GDP and healthcare facilities in different countries to determine the impact on healthcare provision, indicating the need for further investigation.', 'The process of analyzing COVID data for a specific country involves extracting the necessary information from the raw data and performing the analysis using Python libraries like Pandas.', 'The basics of data visualization using Matplotlib and Seaborn libraries are introduced, explaining the significance of representing numerical data through graphs and charts for better comprehension.']}, {'end': 21916.799, 'start': 21545.78, 'title': 'Power of visualization in python', 'summary': 'Illustrates the power of visualization in python using matplotlib to create and customize line plots for agricultural data, including adding axis labels and a legend to improve informativeness and clarity.', 'duration': 371.019, 'highlights': ['The chapter illustrates the power of visualization in Python using matplotlib to create and customize line plots for agricultural data.', 'The chapter emphasizes the importance of adding axis labels and a legend for clarity and informativeness in the visualization.', 'The chapter demonstrates the process of plotting multiple lines within the same graph and interpreting the results to form hypotheses for further investigation.']}, {'end': 22247.685, 'start': 21917.259, 'title': 'Customizing plot markers and lines', 'summary': 'Explains how to customize markers and lines in a plot using plt.dot plot function, supporting different markers, colors, line styles, and marker sizes. it also demonstrates the shorthand method for specifying marker, line style, and color using fmt argument, and the importance of organizing code using functions and modules.', 'duration': 330.426, 'highlights': ['The PLT.dot plot function supports different markers like circles, crosses, squares, etc., and the legend displays the markers accordingly.', 'The plot function allows styling of lines and markers using arguments such as color, line style, width, marker size, edge color, edge width, face color, and opacity.', 'The shorthand method using FMT argument allows specifying marker, line style, and color within a single string when calling the PLT.dot plot function.', 'Organizing code using functions and modules helps in applying consistent styles to multiple plots by passing lists of values and applying the styles.', 'Omitting the line type in the format argument allows only markers to be drawn without the lines, providing flexibility in visualizing data.']}, {'end': 22612.537, 'start': 22248.806, 'title': 'Changing figure size and style with seaborn', 'summary': 'Discusses changing the size and aspect ratio of figures using plt dot figure function, utilizing seaborn library to improve the aesthetics of graphs, and modifying default styles within matplotlib using rc params, with a demonstration of setting font size to 14 and default figure size to nine comma five.', 'duration': 363.731, 'highlights': ['Demonstrating how to change the size and aspect ratio of figures using PLT dot figure function and a tuple, with an example of using two, two and experimenting with values such as eight and four.', 'Explaining the use of Seaborn library to enhance the look and feel of graphs drawn using matplotlib, and showcasing the application of predefined styles such as white grid and dark grid using SNS dot set style function.', 'Detailing the modification of default styles within matplotlib using RC params, showcasing the changes in font size, default figure size, and face color, with a demonstration of setting font size to 14 and default figure size to nine comma five.']}], 'duration': 2375.764, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI20236773.jpg', 'highlights': ['The chapter covers various data manipulation tasks including retrieving continents, calculating life expectancy, and merging COVID-19 stats.', 'Focusing on hands-on experience with the pandas library and related tasks like creating data frames, querying and indexing operations, grouping, merging, and aggregation.', 'The chapter emphasizes on analyzing the relationship between GDP and healthcare facilities in different countries to determine the impact on healthcare provision.', 'The basics of data visualization using Matplotlib and Seaborn libraries are introduced, explaining the significance of representing numerical data through graphs and charts.', 'The PLT.dot plot function supports different markers like circles, crosses, squares, etc., and the legend displays the markers accordingly.', 'Demonstrating how to change the size and aspect ratio of figures using PLT dot figure function and a tuple, with an example of using two, two and experimenting with values such as eight and four.']}, {'end': 24534.27, 'segs': [{'end': 22716.995, 'src': 'embed', 'start': 22686.312, 'weight': 0, 'content': [{'end': 22686.772, 'text': 'So here we have 150 rows.', 'start': 22686.312, 'duration': 0.46}, {'end': 22695.859, 'text': "So for 150 flowers in a garden, let's say we've measured the SEPL length, the SEPL width.", 'start': 22689.676, 'duration': 6.183}, {'end': 22699.9, 'text': 'So SEPL is a part of the flower and then petal is another part of the flower.', 'start': 22696.039, 'duration': 3.861}, {'end': 22703.442, 'text': "So then we've also measured the petal length and the petal width.", 'start': 22700.34, 'duration': 3.102}, {'end': 22706.403, 'text': "And we've noted the species of the flower.", 'start': 22704.622, 'duration': 1.781}, {'end': 22708.664, 'text': 'So this is a typical dataset.', 'start': 22707.383, 'duration': 1.281}, {'end': 22710.731, 'text': 'that you might work with.', 'start': 22709.81, 'duration': 0.921}, {'end': 22711.551, 'text': 'All right.', 'start': 22711.271, 'duration': 0.28}, {'end': 22713.452, 'text': 'And this is a, you can learn more about it.', 'start': 22711.631, 'duration': 1.821}, {'end': 22715.214, 'text': "Here's a link to the Wikipedia page.", 'start': 22713.733, 'duration': 1.481}, {'end': 22716.995, 'text': 'It is called the Iris flower dataset.', 'start': 22715.274, 'duration': 1.721}], 'summary': 'A dataset with 150 rows measures sepl and petal dimensions for iris flowers.', 'duration': 30.683, 'max_score': 22686.312, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI22686312.jpg'}, {'end': 23011.095, 'src': 'embed', 'start': 22979.776, 'weight': 2, 'content': [{'end': 22983.537, 'text': 'And then there are some outliers and then you can see another cluster of points around here.', 'start': 22979.776, 'duration': 3.761}, {'end': 22984.917, 'text': 'These are all together.', 'start': 22984.237, 'duration': 0.68}, {'end': 22987.378, 'text': 'And then you can see three, another cluster of points here.', 'start': 22984.977, 'duration': 2.401}, {'end': 22993.563, 'text': 'So there seem to be three clusters of points and you might have a hypothesis here.', 'start': 22988.499, 'duration': 5.064}, {'end': 22997.786, 'text': 'Okay There are three species of flowers, and now we can see about three clusters of points.', 'start': 22993.623, 'duration': 4.163}, {'end': 23002.509, 'text': 'And it makes sense that no different across different species of flowers.', 'start': 22998.386, 'duration': 4.123}, {'end': 23011.095, 'text': 'We may not really see a relationship, but maybe if we just study a single species separately, maybe we might be able to find a relationship.', 'start': 23002.569, 'duration': 8.526}], 'summary': 'Data analysis reveals three distinct clusters of points, suggesting three species of flowers.', 'duration': 31.319, 'max_score': 22979.776, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI22979776.jpg'}, {'end': 23314.745, 'src': 'embed', 'start': 23285.2, 'weight': 3, 'content': [{'end': 23290.102, 'text': 'If it serves, if it serves your purpose, if not draw scatter plot, maybe that will help you.', 'start': 23285.2, 'duration': 4.902}, {'end': 23294.285, 'text': 'That will help you visualize the data better and so on.', 'start': 23291.123, 'duration': 3.162}, {'end': 23299.466, 'text': "The next plot that we're going to look at is called a histogram,", 'start': 23295.421, 'duration': 4.045}, {'end': 23308.899, 'text': 'and a histogram represents the distribution of data for a single column or for a single type of variable, essentially.', 'start': 23299.466, 'duration': 9.433}, {'end': 23314.745, 'text': 'So what do I mean by that? I think the best way to look at it once again is using an example.', 'start': 23309.983, 'duration': 4.762}], 'summary': 'Use scatter plot to visualize data better. next, explore histogram for data distribution.', 'duration': 29.545, 'max_score': 23285.2, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI23285200.jpg'}, {'end': 23719.703, 'src': 'embed', 'start': 23690.766, 'weight': 5, 'content': [{'end': 23692.707, 'text': 'So this is how you can play around with histograms.', 'start': 23690.766, 'duration': 1.941}, {'end': 23695.568, 'text': 'Just change the number of bins and the size of each bin.', 'start': 23692.807, 'duration': 2.761}, {'end': 23702.191, 'text': 'But I hope you get the idea that the basic idea here is it tells you how a single variable like the sample width.', 'start': 23696.789, 'duration': 5.402}, {'end': 23708.582, 'text': 'how its measurements are distributed along the full range of values that it takes.', 'start': 23703.365, 'duration': 5.217}, {'end': 23713.058, 'text': 'Okay, Now, similar to line charts.', 'start': 23709.815, 'duration': 3.243}, {'end': 23719.703, 'text': 'we can also draw multiple histograms on a single chart and we can reduce the opacity of each histogram so that,', 'start': 23713.058, 'duration': 6.645}], 'summary': 'Histograms show distribution of measurements for a single variable, and multiple histograms can be overlaid on a single chart.', 'duration': 28.937, 'max_score': 23690.766, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI23690766.jpg'}, {'end': 23936.066, 'src': 'embed', 'start': 23907.972, 'weight': 4, 'content': [{'end': 23911.493, 'text': 'And now this histogram obviously is a lot more informative than before.', 'start': 23907.972, 'duration': 3.521}, {'end': 23916.335, 'text': 'So you can see the overall trend, but you can also see the trend for each species of flowers.', 'start': 23911.573, 'duration': 4.762}, {'end': 23922.357, 'text': 'So you can see that setosa lie in this range and then virginica in this range and versicolor in this range.', 'start': 23916.355, 'duration': 6.002}, {'end': 23926.318, 'text': 'So let us save and commit our work before continuing.', 'start': 23923.577, 'duration': 2.741}, {'end': 23929.819, 'text': "Okay So now we've looked at line charts.", 'start': 23927.458, 'duration': 2.361}, {'end': 23932.485, 'text': 'scatterplots and histograms.', 'start': 23931.024, 'duration': 1.461}, {'end': 23936.066, 'text': "The next type of chart that we're going to look at is called the bar chart.", 'start': 23932.885, 'duration': 3.181}], 'summary': 'The transcript discusses the use of histograms to show trends for different flower species and the upcoming topic of bar charts.', 'duration': 28.094, 'max_score': 23907.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI23907972.jpg'}, {'end': 24390.76, 'src': 'embed', 'start': 24366.267, 'weight': 1, 'content': [{'end': 24372.89, 'text': 'So Seaborn dot bar plot, and the bar plot is the plot for drawing bar charts and Seaborn, and you can specify.', 'start': 24366.267, 'duration': 6.623}, {'end': 24375.891, 'text': 'So here we have specified the data as the tips, DF data frame.', 'start': 24372.93, 'duration': 2.961}, {'end': 24379.333, 'text': 'And here we have now specified the day as the X axis.', 'start': 24376.531, 'duration': 2.802}, {'end': 24384.416, 'text': 'So we now say that we want to group on the day and we want to average the total bill.', 'start': 24379.353, 'duration': 5.063}, {'end': 24390.76, 'text': 'right?. So on the X axis we have the day and on the Y axis we have the total bill, and Seaborn will automatically calculate the average.', 'start': 24384.416, 'duration': 6.344}], 'summary': 'Using seaborn to create a bar plot for average total bill grouped by day.', 'duration': 24.493, 'max_score': 24366.267, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI24366267.jpg'}], 'start': 22613.638, 'title': 'Visualizing data with scatter plots, seaborn, and histograms', 'summary': "Covers using scatter plots to visualize relationships between variables in a dataset of 150 flowers, analyzing sepal data limitations, seaborn's plotting and data visualization, creating histograms to visualize sample width distribution, and utilizing histograms and bar charts to visualize trends and compare values, along with analyzing restaurant tips data using seaborn.", 'chapters': [{'end': 22839.553, 'start': 22613.638, 'title': 'Visualizing relationships with scatter plots', 'summary': 'Covers the use of scatter plots to visualize the relationship between two variables in a dataset of 150 flowers, representing sepl length, sepl width, petal length, petal width, and species, with three unique species setosa, versicolor and virginica.', 'duration': 225.915, 'highlights': ['The dataset contains information about 150 flowers and includes measurements of SEPL length, SEPL width, petal length, petal width, and the species of the flower.', 'The dataset contains three unique species of flowers: setosa, versicolor, and virginica.', 'The chapter demonstrates the use of scatter plots to visualize the relationship between the SEPL width and SEPL length, aiming to understand the correlation between the two variables.']}, {'end': 23158.518, 'start': 22839.593, 'title': 'Scatter plot analysis of flower sepal data', 'summary': 'Discusses the limitations of line charts in visualizing random sepal data, and the process of creating a scatter plot to analyze the relationship between sepal length and width, revealing three distinct clusters corresponding to different flower species.', 'duration': 318.925, 'highlights': ['The scatter plot reveals three distinct clusters representing different flower species, aiding in the hypothesis that there are three species of flowers with unique characteristics.', "The use of the 'hue' argument in the scatter plot allows for the differentiation of flower species by color, visually demonstrating the unique characteristics of each species based on sepal length and width.", 'Observations from the scatter plot indicate a rough linear correlation between sepal length and width, with distinct trends for different flower species, potentially inspiring further research into the characteristics and behavior of certain flower species.']}, {'end': 23438.419, 'start': 23158.978, 'title': 'Seaborn plotting and data visualization', 'summary': "Covers the interoperability of seaborn and matplotlib, modifying plots using seaborn, and utilizing seaborn's support for pandas data frames. it also explains the usage and importance of scatter plots, line plots, and histograms for visualizing data.", 'duration': 279.441, 'highlights': ['Seaborn interoperability with Matplotlib allows mixing and matching functions, providing flexibility in modifying plots.', 'Seaborn offers great inbuilt support for pandas data frames, simplifying the process of visualizing data.', "Histograms are essential for visualizing the distribution of data points within a specific range of values, providing a clearer understanding of the data's distribution."]}, {'end': 23834, 'start': 23438.859, 'title': 'Histogram analysis of sample width', 'summary': 'Discusses the process of creating histograms to visualize the distribution of sample width measurements, demonstrating how to determine the intervals, change the number of bins, and overlay multiple histograms with reduced opacity, providing insights into the distribution of measurements for different flower species.', 'duration': 395.141, 'highlights': ['The process of creating histograms to visualize the distribution of sample width measurements is explained, including determining intervals and changing the number of bins.', 'Overlaying multiple histograms with reduced opacity is demonstrated to provide insights into the distribution of measurements for different flower species.', 'The distribution of sample width measurements for setosa and versicolor flowers is visualized using histograms, revealing distinct patterns in the distribution of measurements for the two species.', "The concept of bins in histograms is elaborated, highlighting the ability to specify the number of bins and the intervals for creating bins, enabling customization of the histogram's visual representation."]}, {'end': 24185.683, 'start': 23834.941, 'title': 'Stacking histograms and bar charts', 'summary': 'Explores the concept of stacking histograms and bar charts to visualize trends and compare individual values, demonstrating the process through code examples and comparing the visualizations with line charts, highlighting the overall trend and the trend for each species of flowers, and showcasing the ease of comparing data across multiple years with bar charts and the ability to stack bars on top of one another for visualizing the overall yield of fruits.', 'duration': 350.742, 'highlights': ['The bars get stacked, providing detailed information for each species of flowers with the overall trend, showcasing the ease of comparing data across multiple years with bar charts.', 'The chapter demonstrates the process through code examples and compares the visualizations with line charts, highlighting the overall trend and the trend for each species of flowers.', 'The chapter showcases the ability to stack bars on top of one another for visualizing the overall yield of fruits, providing a convenient case with a single value for each item on the X axis.']}, {'end': 24534.27, 'start': 24185.983, 'title': 'Analyzing restaurant tips data with seaborn', 'summary': 'Demonstrates using the seaborn library to analyze the tips dataset, providing insights into the average total bill comparison across different days, genders, and smoking habits and the automatic computation of averages and visualizations with seaborn.', 'duration': 348.287, 'highlights': ['Seaborn provides helper functions to automatically calculate averages and draw visualizations with confidence intervals, reducing the need for manual group by operations and plotting.', 'The average total bill comparison across different days is automatically computed and visualized using Seaborn, showing variations and trends in the data.', "Using Seaborn's barplot with hue argument, the average total bill comparison between male and female customers is visually represented, indicating differences in average total bill and variation between genders.", 'Investigating the average total bill for smokers reveals higher average bills with significant variations, particularly noticeable on Fridays.']}], 'duration': 1920.632, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI22613638.jpg', 'highlights': ['The dataset contains information about 150 flowers and includes measurements of SEPL length, SEPL width, petal length, petal width, and the species of the flower.', 'Seaborn provides helper functions to automatically calculate averages and draw visualizations with confidence intervals, reducing the need for manual group by operations and plotting.', 'The scatter plot reveals three distinct clusters representing different flower species, aiding in the hypothesis that there are three species of flowers with unique characteristics.', "Histograms are essential for visualizing the distribution of data points within a specific range of values, providing a clearer understanding of the data's distribution.", 'The bars get stacked, providing detailed information for each species of flowers with the overall trend, showcasing the ease of comparing data across multiple years with bar charts.', 'The process of creating histograms to visualize the distribution of sample width measurements is explained, including determining intervals and changing the number of bins.']}, {'end': 26024.782, 'segs': [{'end': 24562.523, 'src': 'embed', 'start': 24535.714, 'weight': 1, 'content': [{'end': 24539.956, 'text': 'Now, one thing that you can also do is you can make the bars horizontal simply by switching the axis.', 'start': 24535.714, 'duration': 4.242}, {'end': 24546.838, 'text': 'So you can go from day total bill to total bill day, and that will switch the axis and draw the bars horizontally.', 'start': 24540.016, 'duration': 6.822}, {'end': 24550.939, 'text': "And sometimes these are slightly nicer to look at, and it's really up to you.", 'start': 24546.898, 'duration': 4.041}, {'end': 24560.023, 'text': 'And like on a case by case basis, you decide what looks better for your use case and do that a good point where you might want to make it.', 'start': 24551.159, 'duration': 8.864}, {'end': 24562.523, 'text': 'You might want to make these.', 'start': 24560.463, 'duration': 2.06}], 'summary': 'Switching the axis can make bars horizontal, providing a different visual perspective.', 'duration': 26.809, 'max_score': 24535.714, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI24535714.jpg'}, {'end': 24600.288, 'src': 'embed', 'start': 24573.46, 'weight': 2, 'content': [{'end': 24577.642, 'text': "let's say the X axis was countries and not a days of the week.", 'start': 24573.46, 'duration': 4.182}, {'end': 24581.044, 'text': "So then you would need to show 200 countries and that's very difficult to show.", 'start': 24577.742, 'duration': 3.302}, {'end': 24585.786, 'text': 'So rather what you might want to do then is use a horizontal plot.', 'start': 24581.684, 'duration': 4.102}, {'end': 24588.168, 'text': 'So you can show a list of 200 countries here and then the.', 'start': 24585.906, 'duration': 2.262}, {'end': 24591.833, 'text': "Y axis could represent, let's say something like population.", 'start': 24589.329, 'duration': 2.504}, {'end': 24592.975, 'text': 'All right.', 'start': 24592.655, 'duration': 0.32}, {'end': 24595.54, 'text': "So that's it about bar plots.", 'start': 24594.137, 'duration': 1.403}, {'end': 24600.288, 'text': "And once again, let's commit a save and commit our work before continuing.", 'start': 24596.081, 'duration': 4.207}], 'summary': 'Suggests using a horizontal plot to display 200 countries by population for better visualization.', 'duration': 26.828, 'max_score': 24573.46, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI24573460.jpg'}, {'end': 24746.456, 'src': 'embed', 'start': 24716.849, 'weight': 3, 'content': [{'end': 24720.77, 'text': 'What does the trend of passengers coming to this airport look like??', 'start': 24716.849, 'duration': 3.921}, {'end': 24729.036, 'text': 'Is it increasing or is it decreasing? And we could do a line plot so we could simply do PLT dot plot.', 'start': 24721.07, 'duration': 7.966}, {'end': 24733.881, 'text': "So this, we have, let's say, let's call this DF.", 'start': 24730.397, 'duration': 3.484}, {'end': 24737.032, 'text': "Let's simply do DF dot passengers.", 'start': 24735.151, 'duration': 1.881}, {'end': 24746.456, 'text': "And it seems like there's an increasing trend, but you can see that things go up and down, and a nicer way to visualize this, in my opinion,", 'start': 24738.172, 'duration': 8.284}], 'summary': 'Passenger trend at the airport seems to be increasing with fluctuations.', 'duration': 29.607, 'max_score': 24716.849, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI24716849.jpg'}, {'end': 25046.818, 'src': 'embed', 'start': 25022.031, 'weight': 0, 'content': [{'end': 25028.833, 'text': "So, next up, we've looked at five types of graphs, and that's pretty much it in terms of the types of plots that we are looking at.", 'start': 25022.031, 'duration': 6.802}, {'end': 25032.114, 'text': 'but matplotlib can also be used to display images.', 'start': 25028.833, 'duration': 3.281}, {'end': 25036.935, 'text': "So to display an image, let's first download an image from the internet.", 'start': 25032.714, 'duration': 4.221}, {'end': 25038.856, 'text': 'So this is the, this is the image.', 'start': 25037.035, 'duration': 1.821}, {'end': 25041.757, 'text': 'You can follow this link to see where it, what it looks like.', 'start': 25038.896, 'duration': 2.861}, {'end': 25046.818, 'text': 'And we are using the URL retrieve function from the URL lib dot request module for this.', 'start': 25042.337, 'duration': 4.481}], 'summary': 'Covered five types of graphs, also able to display images using matplotlib.', 'duration': 24.787, 'max_score': 25022.031, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI25022031.jpg'}], 'start': 24535.714, 'title': 'Visualization techniques in matplotlib', 'summary': 'Covers bar plot and common chart types, visualizing passenger trend, matplotlib graphs, image display, and graph plotting techniques. it explains switching axis for horizontal bars, visualizing passenger trend from 1949 to 1960, customizing subplots, and techniques for creating various types of plots using matplotlib and seaborn.', 'chapters': [{'end': 24657.657, 'start': 24535.714, 'title': 'Bar plot and common chart types', 'summary': 'Discusses how to switch the axis to draw bars horizontally, and explains when to use horizontal bars with a large number of values or when visualizing two-dimensional data using a heat map.', 'duration': 121.943, 'highlights': ['Switching the axis from day total bill to total bill day allows drawing bars horizontally, providing an alternative visualization method for the data.', 'Using horizontal bars is recommended when visualizing a large number of values, such as 200 countries on the x-axis, making it difficult to show on a vertical plot.', 'Heat maps are used to visualize two-dimensional data like a matrix or table using colors, providing a unique way to represent complex data relationships.']}, {'end': 24867.529, 'start': 24658.097, 'title': 'Visualizing passenger trend', 'summary': 'Discusses visualizing the trend of passengers visiting an airport from 1949 to 1960 using a line plot and a heat map, with the suggestion of a heat map for a better visualization and analysis of the trend.', 'duration': 209.432, 'highlights': ['The trend of passengers visiting the airport can be visualized using a line plot, showing an increasing trend with fluctuations.', 'Using a heat map for a better visualization and analysis of the trend is suggested, requiring the data to be represented as a matrix using the pivot method.', 'The data is represented in a matrix format, with months as rows and years as columns, to facilitate visualization and trend analysis.']}, {'end': 25714.609, 'start': 24868.149, 'title': 'Matplotlib graphs and image display', 'summary': 'Discusses visualizing data using heatmaps and displaying images with matplotlib. it also explores plotting multiple charts in a grid using subplots and customizing the appearance of subplots.', 'duration': 846.46, 'highlights': ['The chapter discusses visualizing data using heatmaps and displaying images with Matplotlib.', 'It also explores plotting multiple charts in a grid using subplots and customizing the appearance of subplots.']}, {'end': 26024.782, 'start': 25714.669, 'title': 'Graph plotting techniques', 'summary': 'Covers techniques for creating various types of plots, including scatter plots, line graphs, histograms, bar plots, heat maps, and images using matplotlib and seaborn, with a focus on techniques for visualizing relationships between variables and distributions of values.', 'duration': 310.113, 'highlights': ['The chapter covers techniques for creating various types of plots, including scatter plots, line graphs, histograms, bar plots, heat maps, and images using matplotlib and seaborn.', 'Techniques for visualizing relationships between variables and distributions of values are emphasized, such as the use of scatter plots to visualize the relationship between two variables and histograms to visualize the distribution of values of a single variable.', 'The scatter plot is used to visualize the relationship between two variables, especially when there are a lot of values within a small range.']}], 'duration': 1489.068, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI24535714.jpg', 'highlights': ['The chapter covers techniques for creating various types of plots, including scatter plots, line graphs, histograms, bar plots, heat maps, and images using matplotlib and seaborn.', 'Switching the axis from day total bill to total bill day allows drawing bars horizontally, providing an alternative visualization method for the data.', 'Using horizontal bars is recommended when visualizing a large number of values, such as 200 countries on the x-axis, making it difficult to show on a vertical plot.', 'The trend of passengers visiting the airport can be visualized using a line plot, showing an increasing trend with fluctuations.']}, {'end': 28058.404, 'segs': [{'end': 26064.491, 'src': 'embed', 'start': 26025.99, 'weight': 2, 'content': [{'end': 26029.072, 'text': 'And we saw that we can stack histograms for multiple species.', 'start': 26025.99, 'duration': 3.082}, {'end': 26030.673, 'text': 'So we did a filtering.', 'start': 26029.472, 'duration': 1.201}, {'end': 26041.419, 'text': 'We created a list of values for each species and then we use different colors to plot them so that we can see the overall relationship as well as the relationship for individual species of flowers.', 'start': 26030.733, 'duration': 10.686}, {'end': 26042.54, 'text': 'So that was the histogram.', 'start': 26041.479, 'duration': 1.061}, {'end': 26044.621, 'text': 'But then we looked at a bar plot.', 'start': 26043.28, 'duration': 1.341}, {'end': 26046.862, 'text': 'So our bar plot.', 'start': 26045.101, 'duration': 1.761}, {'end': 26054.606, 'text': 'in this case, we use the restaurant bills data set and the bar plot was used to represent the average bill on a weekday,', 'start': 26046.862, 'duration': 7.744}, {'end': 26059.048, 'text': 'the average total bill on each weekday, and to be able to compare them side by side.', 'start': 26054.606, 'duration': 4.442}, {'end': 26064.491, 'text': 'So for instance, we could see that the bills on Sunday were the highest and the bills on Thursday were the lowest.', 'start': 26059.468, 'duration': 5.023}], 'summary': 'Stacked histograms show relationship for multiple species, bar plot compares average bills on weekdays.', 'duration': 38.501, 'max_score': 26025.99, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI26025990.jpg'}, {'end': 26135.42, 'src': 'embed', 'start': 26085.421, 'weight': 3, 'content': [{'end': 26090.523, 'text': 'So we looked at heat maps to visualize flight traffic into a particular airport over time.', 'start': 26085.421, 'duration': 5.102}, {'end': 26097.029, 'text': 'So from 1949 to 1960, and then we saw How the footfall increased over the years.', 'start': 26090.723, 'duration': 6.306}, {'end': 26100.473, 'text': 'but within a year the footfall increased up to July, and then it tended to decrease.', 'start': 26097.029, 'duration': 3.444}, {'end': 26104.76, 'text': 'Right And a heat map was a good way to visualize that in a compact fashion.', 'start': 26100.774, 'duration': 3.986}, {'end': 26108.505, 'text': 'And finally, we also looked at how to display images.', 'start': 26105.641, 'duration': 2.864}, {'end': 26117.47, 'text': 'We first load up an image using pilt.image.open.', 'start': 26110.066, 'duration': 7.404}, {'end': 26120.432, 'text': 'And then we can display it using pilt.imshow.', 'start': 26118.031, 'duration': 2.401}, {'end': 26128.777, 'text': 'We also learned that images are actually in Python are represented using NumPy arrays with a certain number of rows, columns, and pixel values.', 'start': 26120.892, 'duration': 7.885}, {'end': 26135.42, 'text': 'So there are 3D arrays and you can actually select a specific slice of the image and just display a small part of the image.', 'start': 26128.797, 'duration': 6.623}], 'summary': 'Analyzed flight traffic heat maps from 1949 to 1960, observed footfall increase, and learned to display images using python and numpy arrays.', 'duration': 49.999, 'max_score': 26085.421, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI26085421.jpg'}, {'end': 26265.973, 'src': 'embed', 'start': 26227.313, 'weight': 0, 'content': [{'end': 26233.62, 'text': 'So here you can see that this is just a few lines of code, less than 20 lines of code, and you get this beautiful visualization.', 'start': 26227.313, 'duration': 6.307}, {'end': 26235.783, 'text': 'and all of these contain Seaborn.', 'start': 26233.62, 'duration': 2.163}, {'end': 26237.104, 'text': 'contains example data sets.', 'start': 26235.783, 'duration': 1.321}, {'end': 26240.188, 'text': 'So you can just try out, try it out using these example data sets.', 'start': 26237.164, 'duration': 3.024}, {'end': 26241.409, 'text': 'So do try it out.', 'start': 26240.608, 'duration': 0.801}, {'end': 26245.394, 'text': "If you just do it once, you're going to get a good idea, a good handle over Seaborn.", 'start': 26241.689, 'duration': 3.705}, {'end': 26249.4, 'text': 'Then similarly, we have this a gallery for matplotlib.', 'start': 26246.598, 'duration': 2.802}, {'end': 26253.083, 'text': 'Now since Seaborn builds on top of matplotlib about that.', 'start': 26250.041, 'duration': 3.042}, {'end': 26259.428, 'text': 'So for matplotlib, you will have to write a little more code, but matplotlib is equally powerful ultimately,', 'start': 26253.323, 'duration': 6.105}, {'end': 26262.07, 'text': 'and it gives you a lot more control over your plots.', 'start': 26259.428, 'duration': 2.642}, {'end': 26265.973, 'text': 'So you can open up any of these examples and see.', 'start': 26262.63, 'duration': 3.343}], 'summary': 'Seaborn offers concise code for beautiful visualizations, with less than 20 lines, while matplotlib provides more control with additional code.', 'duration': 38.66, 'max_score': 26227.313, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI26227313.jpg'}, {'end': 26440.68, 'src': 'embed', 'start': 26413.096, 'weight': 5, 'content': [{'end': 26417.309, 'text': 'Okay So with that, we complete our discussion of data visualization.', 'start': 26413.096, 'duration': 4.213}, {'end': 26419.512, 'text': 'So let us open up the course project.', 'start': 26417.79, 'duration': 1.722}, {'end': 26422.957, 'text': 'This is one of the most interesting things, interesting parts of this course.', 'start': 26419.832, 'duration': 3.125}, {'end': 26431.714, 'text': 'So the objective of the course project is to apply all the skills and techniques that you have learned during the course onto a real world dataset.', 'start': 26424.11, 'duration': 7.604}, {'end': 26436.017, 'text': 'And the thing about the course project is well to get it accepted.', 'start': 26432.415, 'duration': 3.602}, {'end': 26440.68, 'text': "There is a certain evaluation criteria that you have to complete, and we'll talk about that,", 'start': 26436.557, 'duration': 4.123}], 'summary': 'Course project applies learned skills to real dataset. focus on meeting evaluation criteria.', 'duration': 27.584, 'max_score': 26413.096, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI26413096.jpg'}, {'end': 26793.212, 'src': 'embed', 'start': 26768.158, 'weight': 13, 'content': [{'end': 26774.301, 'text': 'You can work on the data as you see fit, but these are just some guidelines to help you along the process, okay?', 'start': 26768.158, 'duration': 6.143}, {'end': 26781.684, 'text': "So once you're done with the data preparation and cleaning, then you will perform some exploratory analysis and visualization.", 'start': 26774.881, 'duration': 6.803}, {'end': 26788.248, 'text': 'So what this means is you will compute the mean The average, the sum range and other interesting statistics for numeric columns.', 'start': 26781.724, 'duration': 6.524}, {'end': 26793.212, 'text': "So here we are still discovering you're going one step deeper where you're looking into specific columns.", 'start': 26788.268, 'duration': 4.944}], 'summary': 'Perform data preparation, cleaning, exploratory analysis, and visualization with computation of mean, average, sum, range, and other statistics for numeric columns.', 'duration': 25.054, 'max_score': 26768.158, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI26768158.jpg'}, {'end': 27246.702, 'src': 'embed', 'start': 27217.677, 'weight': 14, 'content': [{'end': 27223.098, 'text': 'The evaluation criteria is this that your submission will be evaluated on these criteria.', 'start': 27217.677, 'duration': 5.421}, {'end': 27226.899, 'text': 'that the dataset must contain at least three columns and 150 rows of data.', 'start': 27223.098, 'duration': 3.801}, {'end': 27232.96, 'text': 'So this is just so that it has enough of a breadth, enough information within the data.', 'start': 27227.159, 'duration': 5.801}, {'end': 27237.66, 'text': 'You must ask and answer at least five questions about the dataset.', 'start': 27233.7, 'duration': 3.96}, {'end': 27240.901, 'text': 'So asking the right questions is also a skill.', 'start': 27238.741, 'duration': 2.16}, {'end': 27246.702, 'text': "So that's why you have to ask, come up with five interesting original questions about the dataset.", 'start': 27241.221, 'duration': 5.481}], 'summary': 'Submission will be evaluated based on dataset criteria: 3 columns, 150 rows; 5 questions required.', 'duration': 29.025, 'max_score': 27217.677, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI27217677.jpg'}, {'end': 27308.045, 'src': 'embed', 'start': 27279.169, 'weight': 4, 'content': [{'end': 27284.254, 'text': 'Presentation is an important part of the, of data science and especially of doing projects.', 'start': 27279.169, 'duration': 5.085}, {'end': 27290.197, 'text': 'And then your work must not be plagiarized that you should not have copy pasted the entire thing from somewhere else.', 'start': 27284.975, 'duration': 5.222}, {'end': 27293.839, 'text': 'Now you can see a lot of these datasets are already analyzed.', 'start': 27290.657, 'duration': 3.182}, {'end': 27297.52, 'text': "There's a lot of information available online, so you can repeat what somebody else has done.", 'start': 27293.859, 'duration': 3.661}, {'end': 27303.343, 'text': "But just there's a difference between simply copy pasting where you do not really understand what has happened,", 'start': 27298.161, 'duration': 5.182}, {'end': 27308.045, 'text': 'versus borrowing the right things from specific notebooks or tutorials.', 'start': 27303.343, 'duration': 4.702}], 'summary': 'Data science projects should avoid plagiarism and focus on understanding and borrowing from existing analyses.', 'duration': 28.876, 'max_score': 27279.169, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI27279169.jpg'}, {'end': 27367.304, 'src': 'embed', 'start': 27338.186, 'weight': 6, 'content': [{'end': 27339.647, 'text': 'So these are the evaluation criteria.', 'start': 27338.186, 'duration': 1.461}, {'end': 27341.808, 'text': 'This is how we will evaluate your work.', 'start': 27339.667, 'duration': 2.141}, {'end': 27349.452, 'text': 'Now, I just want to spend a little more time talking about some of the interesting, some of the data sets that you can use for working.', 'start': 27342.268, 'duration': 7.184}, {'end': 27354.775, 'text': 'So we have picked, we have created a list of a few data sources from where you can get data sets.', 'start': 27349.552, 'duration': 5.223}, {'end': 27357.296, 'text': 'So the first one is Kaggle datasets.', 'start': 27355.515, 'duration': 1.781}, {'end': 27360.539, 'text': 'This is the easiest one to get started with.', 'start': 27357.476, 'duration': 3.063}, {'end': 27367.304, 'text': 'So you just click on Kaggle datasets and you will find a whole bunch of, there are over 50, 000 datasets you can pick from.', 'start': 27361.239, 'duration': 6.065}], 'summary': 'Evaluation criteria discussed, including access to 50,000+ datasets on kaggle.', 'duration': 29.118, 'max_score': 27338.186, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI27338186.jpg'}, {'end': 27436.61, 'src': 'embed', 'start': 27409.647, 'weight': 7, 'content': [{'end': 27415.069, 'text': "So that's one example, the us elections dataset, and there are lots more that you can look through here.", 'start': 27409.647, 'duration': 5.422}, {'end': 27416.47, 'text': 'You can also filter.', 'start': 27415.649, 'duration': 0.821}, {'end': 27420.635, 'text': 'There are a lot of filters that you can use for this dataset for these datasets.', 'start': 27416.49, 'duration': 4.145}, {'end': 27422.538, 'text': "So that's Kaggle datasets.", 'start': 27421.116, 'duration': 1.422}, {'end': 27425.241, 'text': 'Then similarly, you have the UCI machine learning repository.', 'start': 27422.698, 'duration': 2.543}, {'end': 27427.444, 'text': 'This has a whole bunch of datasets too.', 'start': 27425.741, 'duration': 1.703}, {'end': 27431.268, 'text': 'So here you can, for example, just click on view all datasets here.', 'start': 27427.584, 'duration': 3.684}, {'end': 27432.629, 'text': 'and you can see.', 'start': 27432.089, 'duration': 0.54}, {'end': 27436.61, 'text': 'you can even filter it out with different areas, different attributes.', 'start': 27432.629, 'duration': 3.981}], 'summary': 'Kaggle and uci offer numerous datasets for exploration and filtering.', 'duration': 26.963, 'max_score': 27409.647, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI27409647.jpg'}, {'end': 27510.413, 'src': 'embed', 'start': 27471.474, 'weight': 9, 'content': [{'end': 27474.417, 'text': 'Okay Then you have awesome public data sets.', 'start': 27471.474, 'duration': 2.943}, {'end': 27475.919, 'text': 'This is also another great source.', 'start': 27474.437, 'duration': 1.482}, {'end': 27481.345, 'text': 'So this is a GitHub repository from where you can find links to a lot of data sets in many different areas.', 'start': 27476.019, 'duration': 5.326}, {'end': 27489.294, 'text': "So if you are from a different domain, like whatever domain you're from, or you're currently working in, find it interesting data set there, use that.", 'start': 27481.826, 'duration': 7.468}, {'end': 27491.116, 'text': 'And then you have.', 'start': 27490.335, 'duration': 0.781}, {'end': 27494.719, 'text': 'We also linked to a place you have Google dataset search.', 'start': 27492.077, 'duration': 2.642}, {'end': 27500.384, 'text': 'So on Google dataset search, you can find data for anything and everything pretty much.', 'start': 27495.2, 'duration': 5.184}, {'end': 27502.907, 'text': 'So, for instance, if we try global temperatures,', 'start': 27500.424, 'duration': 2.483}, {'end': 27510.413, 'text': 'you can see that there is some data here about global surface temperatures and you can then get this data.', 'start': 27502.907, 'duration': 7.506}], 'summary': 'Github repository offers access to diverse datasets; google dataset search provides wide-ranging data, e.g. global temperatures.', 'duration': 38.939, 'max_score': 27471.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI27471474.jpg'}, {'end': 27662.416, 'src': 'embed', 'start': 27634.469, 'weight': 16, 'content': [{'end': 27637.832, 'text': 'So these are data sets that we would highly recommend that you can analyze.', 'start': 27634.469, 'duration': 3.363}, {'end': 27642.957, 'text': 'So there are some game data sets like Dota, cricket, basketball, PUBG, and so on.', 'start': 27638.493, 'duration': 4.464}, {'end': 27649.486, 'text': 'And then there is also data that you can download from your personal apps that you use.', 'start': 27644.062, 'duration': 5.424}, {'end': 27655.611, 'text': "So things like WhatsApp, Chrome, Google calendar, Apple's apps, Instagram, Facebook, LinkedIn.", 'start': 27649.506, 'duration': 6.105}, {'end': 27658.593, 'text': 'And that is also another really interesting thing to do.', 'start': 27656.491, 'duration': 2.102}, {'end': 27662.416, 'text': 'So now what I want to do is I want to show you a few examples.', 'start': 27659.193, 'duration': 3.223}], 'summary': 'Analyze recommended game and personal app data sets for insights.', 'duration': 27.947, 'max_score': 27634.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI27634469.jpg'}], 'start': 26025.99, 'title': 'Data visualization techniques and project guidelines', 'summary': 'Covers visualization techniques including histograms, bar plots, heat maps, and image display using numpy arrays. it also discusses data visualization tools seaborn and matplotlib, emphasizing the ease and power of seaborn. furthermore, it outlines project guidelines for dataset selection, data cleaning, exploratory analysis, visualization, and project submission criteria.', 'chapters': [{'end': 26135.42, 'start': 26025.99, 'title': 'Data visualization techniques', 'summary': "Covered the visualization of multiple species' histograms, comparison of average restaurant bills by weekday using bar plots, visualization of flight traffic over time using heat maps, and display of images using numpy arrays.", 'duration': 109.43, 'highlights': ["We learned how to visualize multiple species' histograms by stacking them and using different colors to plot their relationships.", 'The bar plot was used to represent the average restaurant bill by weekday, showcasing that Sunday had the highest bills and Thursday had the lowest.', 'Heat maps were used to visualize flight traffic into a particular airport over time, revealing an increase in footfall from 1949 to 1960, with a decrease after July within each year.', 'We also learned how to display images by loading them as NumPy arrays and selecting specific slices for display.']}, {'end': 26471.938, 'start': 26135.961, 'title': 'Data visualization techniques and tools', 'summary': 'Provides a comprehensive overview of data visualization tools such as seaborn and matplotlib, offering practical guidance on their usage, benefits, and differences, emphasizing the ease and power of seaborn for creating visually appealing plots with minimal code.', 'duration': 335.977, 'highlights': ['Seaborn provides powerful visualization capabilities with minimal code, allowing the creation of visually appealing plots using less than 20 lines of code.', 'Matplotlib, while requiring more code, offers greater control over plots, making it suitable for more complex visualizations and providing extensive documentation and tutorials for in-depth learning.', 'The course project encourages applying learned skills to real-world datasets, allowing for open-ended exploration and the creation of a diverse range of graphs and insights to showcase on portfolios and professional profiles.']}, {'end': 27066.829, 'start': 26473.007, 'title': 'Project guidelines and dataset selection', 'summary': 'Outlines guidelines for selecting a real-world dataset, including the preferred formats (csv, json, excel), minimum requirements of three columns and 150 rows, and advice on data cleaning and preparation, exploratory analysis, visualization, and asking and answering questions about the data.', 'duration': 593.822, 'highlights': ['Guidelines for dataset selection', 'Preferred formats and minimum requirements', 'Data cleaning and preparation', 'Exploratory analysis and visualization', 'Asking and answering questions about the data']}, {'end': 27315.787, 'start': 27067.469, 'title': 'Data science project submission', 'summary': 'Emphasizes the importance of presentation in data science projects, providing guidelines for creating and submitting a project, including evaluation criteria such as data set requirements, question quantity, visualizations, and avoiding plagiarism.', 'duration': 248.318, 'highlights': ['The chapter emphasizes the importance of presentation in data science projects', 'Guidelines for creating and submitting a project, including evaluation criteria such as data set requirements, question quantity, visualizations, and avoiding plagiarism', 'Evaluation criteria such as data set must contain at least three columns and 150 rows of data, ask and answer at least five questions about the dataset, include at least five visualizations or graphs, and include explanations using Markdown cells']}, {'end': 28058.404, 'start': 27315.787, 'title': 'Data sources and project evaluation', 'summary': 'Explains the criteria for project evaluation, recommends data sources like kaggle datasets, uci machine learning repository, awesome public data sets, and google dataset search, and provides examples of good projects, such as analyzing personal browser history, covid-19 data, and whatsapp data.', 'duration': 742.617, 'highlights': ['The chapter explains the criteria for project evaluation, recommends data sources like Kaggle datasets, UCI machine learning repository, awesome public data sets, and Google dataset search, and provides examples of good projects, such as analyzing personal browser history, COVID-19 data, and WhatsApp data.', 'Kaggle datasets provides access to over 50,000 datasets, allowing users to download data in CSV format, such as the US elections dataset.', 'UCI machine learning repository offers various datasets with different data types and attributes, like the breast cancer dataset containing around 2000 data.', 'Awesome public data sets on GitHub repository offers links to diverse datasets across different domains, encouraging users to explore and use relevant datasets.', 'Google dataset search allows users to find a wide range of data and provides an example of searching for global temperatures data.']}], 'duration': 2032.414, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI26025990.jpg', 'highlights': ['Seaborn provides powerful visualization capabilities with minimal code, allowing visually appealing plots using less than 20 lines of code.', 'Matplotlib offers greater control over plots, suitable for complex visualizations and providing extensive documentation and tutorials for in-depth learning.', 'The bar plot showcased the average restaurant bill by weekday, revealing that Sunday had the highest bills and Thursday had the lowest.', 'Heat maps visualized flight traffic into a particular airport over time, revealing an increase in footfall from 1949 to 1960, with a decrease after July within each year.', 'The chapter emphasizes the importance of presentation in data science projects and provides guidelines for creating and submitting a project, including evaluation criteria.', 'The course project encourages applying learned skills to real-world datasets, allowing for open-ended exploration and the creation of a diverse range of graphs and insights.', 'The chapter explains the criteria for project evaluation, recommends data sources like Kaggle datasets, UCI machine learning repository, awesome public data sets, and Google dataset search.', 'Kaggle datasets provide access to over 50,000 datasets, allowing users to download data in CSV format, such as the US elections dataset.', 'UCI machine learning repository offers various datasets with different data types and attributes, like the breast cancer dataset containing around 2000 data.', 'Awesome public data sets on GitHub repository offers links to diverse datasets across different domains, encouraging users to explore and use relevant datasets.', 'Google dataset search allows users to find a wide range of data and provides an example of searching for global temperatures data.', "We learned how to visualize multiple species' histograms by stacking them and using different colors to plot their relationships.", 'We also learned how to display images by loading them as NumPy arrays and selecting specific slices for display.', 'Guidelines for dataset selection, preferred formats, minimum requirements, data cleaning, preparation, exploratory analysis, and visualization were outlined.', 'Evaluation criteria such as data set must contain at least three columns and 150 rows of data, ask and answer at least five questions about the dataset, include at least five visualizations or graphs, and include explanations using Markdown cells.', 'Data sources like Kaggle datasets, UCI machine learning repository, awesome public data sets, and Google dataset search were recommended for project datasets.', 'Examples of good projects were provided, such as analyzing personal browser history, COVID-19 data, and WhatsApp data.']}, {'end': 29068.392, 'segs': [{'end': 28109.57, 'src': 'embed', 'start': 28081.929, 'weight': 2, 'content': [{'end': 28085.29, 'text': 'So if you want to explore more interesting plots, you can check out this tutorial.', 'start': 28081.929, 'duration': 3.361}, {'end': 28088.731, 'text': 'And then there is also a couple more that you can follow.', 'start': 28086.43, 'duration': 2.301}, {'end': 28096.292, 'text': 'Okay We are hoping to have a lot more examples as the course participant work on projects.', 'start': 28088.991, 'duration': 7.301}, {'end': 28102.58, 'text': 'So you can take your projects and then share them on the, on this thread, discuss and share your work thread.', 'start': 28096.693, 'duration': 5.887}, {'end': 28106.564, 'text': 'And you can also share ideas for datasets on the datasets thread.', 'start': 28103.401, 'duration': 3.163}, {'end': 28109.57, 'text': 'So we have a separate thread for datasets.', 'start': 28107.748, 'duration': 1.822}], 'summary': 'Explore interesting plots in tutorial, share projects and datasets on dedicated threads.', 'duration': 27.641, 'max_score': 28081.929, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28081929.jpg'}, {'end': 28215.969, 'src': 'embed', 'start': 28190.519, 'weight': 0, 'content': [{'end': 28195.946, 'text': 'and i hope that this will be something that you will be able to proudly showcase on your professional profile.', 'start': 28190.519, 'duration': 5.427}, {'end': 28202.745, 'text': 'Our final lecture is exploratory data analysis,', 'start': 28199.784, 'duration': 2.961}, {'end': 28212.468, 'text': 'a case study where we will be taking everything that we have learned in the entire course and bringing it together into a single project where we will be analyzing a real world data set.', 'start': 28202.745, 'duration': 9.723}, {'end': 28215.969, 'text': 'And we will be asking and answering some interesting questions about the data.', 'start': 28212.768, 'duration': 3.201}], 'summary': 'Final lecture: exploratory data analysis case study to showcase skills on professional profile.', 'duration': 25.45, 'max_score': 28190.519, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28190519.jpg'}, {'end': 28280.923, 'src': 'embed', 'start': 28254.801, 'weight': 4, 'content': [{'end': 28259.783, 'text': 'This will take you to the course page and the lecture page where you will be able to see the video.', 'start': 28254.801, 'duration': 4.982}, {'end': 28263.685, 'text': "So the video that you're watching right now will be available as a recording for you to review.", 'start': 28259.983, 'duration': 3.702}, {'end': 28268.001, 'text': 'Now for each lecture, we have been using Jupyter notebooks.', 'start': 28264.696, 'duration': 3.305}, {'end': 28273.028, 'text': 'So today we will be using the Jupyter notebook EDA on stack overflow developer survey.', 'start': 28268.361, 'duration': 4.667}, {'end': 28277.775, 'text': 'So you can just click on this link and that will bring you to this Jupyter notebook.', 'start': 28273.429, 'duration': 4.346}, {'end': 28280.923, 'text': 'Now you can view this Jupyter notebook.', 'start': 28278.942, 'duration': 1.981}], 'summary': 'Access course and lecture pages, watch recorded videos, and use jupyter notebooks for eda.', 'duration': 26.122, 'max_score': 28254.801, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28254801.jpg'}, {'end': 28437.64, 'src': 'embed', 'start': 28396.878, 'weight': 1, 'content': [{'end': 28405.864, 'text': 'we will be analyzing responses from the stack overflow annual developer survey and we will apply all the things that we have learned so far about Python Jupiter,', 'start': 28396.878, 'duration': 8.986}, {'end': 28410.208, 'text': "NumPy, pandas, matplotlib, Seaborn together and we'll bring everything together.", 'start': 28405.864, 'duration': 4.344}, {'end': 28417.531, 'text': 'Now you can run this code online just in the way that we have, by using the run button that we just used to run this code on binder.', 'start': 28410.808, 'duration': 6.723}, {'end': 28420.112, 'text': 'but you can also run this code on your computer locally.', 'start': 28417.531, 'duration': 2.581}, {'end': 28423.073, 'text': 'So you can follow the instructions given here under option two.', 'start': 28420.472, 'duration': 2.601}, {'end': 28428.716, 'text': "Now, as I mentioned, we'll be analyzing the stack overflow developer survey dataset for our analysis.", 'start': 28423.854, 'duration': 4.862}, {'end': 28436.879, 'text': 'And this is an annual survey conducted by stack overflow, and you can find the raw data and results on this page insights.stackoverflow.com.', 'start': 28428.996, 'duration': 7.883}, {'end': 28437.64, 'text': 'In fact.', 'start': 28437.36, 'duration': 0.28}], 'summary': 'Analyzing stack overflow annual developer survey using python libraries for data analysis.', 'duration': 40.762, 'max_score': 28396.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28396878.jpg'}, {'end': 28482.137, 'src': 'embed', 'start': 28454.725, 'weight': 6, 'content': [{'end': 28460.388, 'text': "So the first thing that you could do is just download the CSV manually and upload it using Jupiter's graphical interface.", 'start': 28454.725, 'duration': 5.663}, {'end': 28461.829, 'text': "So let's do that.", 'start': 28460.989, 'duration': 0.84}, {'end': 28462.97, 'text': "Let's see how to do that.", 'start': 28462.009, 'duration': 0.961}, {'end': 28471.934, 'text': 'So you click download full dataset and that points us to a Google drive link, and then you can download the Google drive link and.', 'start': 28463.43, 'duration': 8.504}, {'end': 28477.256, 'text': "So here I'm downloading it to my desktop and then you can unzip the Google drive link.", 'start': 28473.155, 'duration': 4.101}, {'end': 28482.137, 'text': 'So once you unzip this link, you will see this folder developer survey 2020.', 'start': 28477.436, 'duration': 4.701}], 'summary': 'Manually download the csv, upload in jupiter, and unzip to access the folder developer survey 2020.', 'duration': 27.412, 'max_score': 28454.725, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28454725.jpg'}, {'end': 28593.483, 'src': 'embed', 'start': 28564.153, 'weight': 5, 'content': [{'end': 28566.875, 'text': 'We are going to use a helper library called open datasets.', 'start': 28564.153, 'duration': 2.722}, {'end': 28573.24, 'text': 'So this is a library that we have created for you to make it easier for you to do data analysis,', 'start': 28567.575, 'duration': 5.665}, {'end': 28577.883, 'text': 'where we are creating a curated collection of datasets for data analysis and machine learning.', 'start': 28573.24, 'duration': 4.643}, {'end': 28582.427, 'text': 'And these datasets can be downloaded into Jupyter with a single Python command.', 'start': 28578.384, 'duration': 4.043}, {'end': 28584.311, 'text': 'This is how it works.', 'start': 28583.55, 'duration': 0.761}, {'end': 28587.715, 'text': 'We have to install this library pip install open datasets.', 'start': 28584.611, 'duration': 3.104}, {'end': 28593.483, 'text': "So let's just come back to the Jupyter notebook and let us run pip install open datasets.", 'start': 28588.456, 'duration': 5.027}], 'summary': 'A helper library called open datasets simplifies data analysis, providing curated datasets for machine learning, downloadable into jupyter using a single python command.', 'duration': 29.33, 'max_score': 28564.153, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28564153.jpg'}, {'end': 28741.466, 'src': 'embed', 'start': 28713.037, 'weight': 8, 'content': [{'end': 28721.665, 'text': "Okay So we will load the CSV files using pandas and let's import pandas as PD and let's use the PD dot read CSV.", 'start': 28713.037, 'duration': 8.628}, {'end': 28730.819, 'text': 'function where we will pass in the path to the survey results, public dot CSV file, which contains the survey responses.', 'start': 28723.433, 'duration': 7.386}, {'end': 28733.16, 'text': 'And we are going to call it survey raw DF.', 'start': 28731.379, 'duration': 1.781}, {'end': 28741.466, 'text': 'So the reason for calling it that is because this is going to be the raw dataset, the unprocessed dataset that we are loading up.', 'start': 28733.581, 'duration': 7.885}], 'summary': 'Using pandas, import survey responses from csv file as survey raw df.', 'duration': 28.429, 'max_score': 28713.037, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28713037.jpg'}, {'end': 29075.217, 'src': 'embed', 'start': 29045.879, 'weight': 9, 'content': [{'end': 29052.926, 'text': 'This is going to ask us for an API key, which we can get from our profile and just paste it in here.', 'start': 29045.879, 'duration': 7.047}, {'end': 29056.449, 'text': 'And that is going to then commit this notebook to our profile.', 'start': 29053.906, 'duration': 2.543}, {'end': 29064.128, 'text': 'Yeah, So now this notebook has been committed, so you can view this notebook on your Jovian profile whenever you want to.', 'start': 29057.543, 'duration': 6.585}, {'end': 29068.392, 'text': 'whenever, wherever you come in from, whether you come in from binder or you come in from your local computer,', 'start': 29064.128, 'duration': 4.264}, {'end': 29075.217, 'text': 'everything gets saved on your Jovian profile and then you can take this and run it on binder whenever you need it to continue your work.', 'start': 29068.392, 'duration': 6.825}], 'summary': 'Get api key from profile to commit notebook, which is then accessible on jovian profile and can be run on binder.', 'duration': 29.338, 'max_score': 29045.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29045879.jpg'}], 'start': 28059.344, 'title': 'Data science course project overview', 'summary': 'Provides an overview of the course project, including resources available, guidelines for project work, and the final case study on exploratory data analysis, where real-world datasets will be analyzed and conclusions drawn, to be showcased as a professional profile.', 'chapters': [{'end': 28350.233, 'start': 28059.344, 'title': 'Data science course project overview', 'summary': 'Provides an overview of the course project, including resources available, guidelines for project work, and the final case study on exploratory data analysis, where real-world datasets will be analyzed and conclusions drawn, to be showcased as a professional profile.', 'duration': 290.889, 'highlights': ['The final case study on exploratory data analysis, involving real-world datasets, will be analyzed and conclusions drawn, to be showcased as a professional profile.', 'Guidelines and resources are provided for project work, including the option to seek assistance and discuss ideas on the dedicated forum.', 'Access to course materials, including lecture videos and Jupyter notebooks, is provided on the course page for lesson six.']}, {'end': 28543.08, 'start': 28350.233, 'title': 'Python eda: stack overflow survey', 'summary': 'Covers the process of setting up and analyzing responses from the stack overflow annual developer survey using python, jupyter, numpy, pandas, matplotlib, and seaborn, including instructions for running the code on binder and locally, and methods for downloading and uploading the dataset.', 'duration': 192.847, 'highlights': ['The notebook focuses on analyzing responses from the stack overflow annual developer survey using Python, Jupyter, NumPy, pandas, matplotlib, and Seaborn, and provides instructions for running the code on Binder and locally.', 'The Stack Overflow developer survey dataset for analysis is obtained from the annual survey conducted by Stack Overflow, with an emphasis on the 2020 results, and methods for downloading and uploading the dataset are explained.', "Instructions for downloading the dataset into Jupyter are provided, including manual download and upload using Jupyter's graphical interface, and using a direct link to the raw CSV file."]}, {'end': 29068.392, 'start': 28543.82, 'title': 'Using open datasets for data analysis', 'summary': 'Introduces the use of the open datasets library for data analysis and machine learning, demonstrating how to download datasets with a single python command, access schema files, load csv files using pandas, and commit the notebook to a jovian profile.', 'duration': 524.572, 'highlights': ['The chapter introduces the use of the open datasets library for data analysis and machine learning', 'Demonstrates how to download datasets with a single Python command', 'Accessing schema files and loading CSV files using pandas', 'Committing the notebook to a Jovian profile']}], 'duration': 1009.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI28059344.jpg', 'highlights': ['The final case study on exploratory data analysis, involving real-world datasets, will be analyzed and conclusions drawn, to be showcased as a professional profile.', 'The notebook focuses on analyzing responses from the stack overflow annual developer survey using Python, Jupyter, NumPy, pandas, matplotlib, and Seaborn, and provides instructions for running the code on Binder and locally.', 'Guidelines and resources are provided for project work, including the option to seek assistance and discuss ideas on the dedicated forum.', 'The Stack Overflow developer survey dataset for analysis is obtained from the annual survey conducted by Stack Overflow, with an emphasis on the 2020 results, and methods for downloading and uploading the dataset are explained.', 'Access to course materials, including lecture videos and Jupyter notebooks, is provided on the course page for lesson six.', 'The chapter introduces the use of the open datasets library for data analysis and machine learning', "Instructions for downloading the dataset into Jupyter are provided, including manual download and upload using Jupyter's graphical interface, and using a direct link to the raw CSV file.", 'Demonstrates how to download datasets with a single Python command', 'Accessing schema files and loading CSV files using pandas', 'Committing the notebook to a Jovian profile']}, {'end': 30340.568, 'segs': [{'end': 29092.163, 'src': 'embed', 'start': 29068.392, 'weight': 0, 'content': [{'end': 29075.217, 'text': 'everything gets saved on your Jovian profile and then you can take this and run it on binder whenever you need it to continue your work.', 'start': 29068.392, 'duration': 6.825}, {'end': 29076.758, 'text': 'All right.', 'start': 29076.398, 'duration': 0.36}, {'end': 29078.3, 'text': 'So moving ahead.', 'start': 29077.159, 'duration': 1.141}, {'end': 29081.035, 'text': 'Now we have our data as data frames.', 'start': 29079.454, 'duration': 1.581}, {'end': 29088.78, 'text': 'And while the survey contains a wealth of information, it contains about 65, 000 responses to 60 questions.', 'start': 29081.475, 'duration': 7.305}, {'end': 29092.163, 'text': 'We will limit our analysis to us a few areas.', 'start': 29089.301, 'duration': 2.862}], 'summary': 'Data frames contain 65,000 responses to 60 questions for analysis.', 'duration': 23.771, 'max_score': 29068.392, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29068392.jpg'}, {'end': 29153.914, 'src': 'embed', 'start': 29130.297, 'weight': 3, 'content': [{'end': 29137.622, 'text': 'And we will also understand some employment related information, professional and information preferences and opinions.', 'start': 29130.297, 'duration': 7.325}, {'end': 29144.047, 'text': 'So something related to the kind of roles people are holding in data science and, programming fields.', 'start': 29137.842, 'duration': 6.205}, {'end': 29148.432, 'text': 'Okay So to do that, let us select a subset of the columns.', 'start': 29144.727, 'duration': 3.705}, {'end': 29153.914, 'text': 'So here are some columns for demographics, and then here are some columns for the programming experience.', 'start': 29148.612, 'duration': 5.302}], 'summary': 'Analyzing data on employment, professional preferences, and programming experience in data science and programming fields.', 'duration': 23.617, 'max_score': 29130.297, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29130297.jpg'}, {'end': 29424.461, 'src': 'embed', 'start': 29389.352, 'weight': 5, 'content': [{'end': 29392.515, 'text': 'Okay And we will choose to convert the rest of these into strings.', 'start': 29389.352, 'duration': 3.163}, {'end': 29397.843, 'text': 'And the way to do that is to use the PD dot two numeric function.', 'start': 29393.618, 'duration': 4.225}, {'end': 29401.507, 'text': 'So the PD dot two numeric function, and you can check the documentation.', 'start': 29398.343, 'duration': 3.164}, {'end': 29407.374, 'text': 'It takes a series or a column and it converts that series into a numeric data, right?', 'start': 29401.787, 'duration': 5.587}, {'end': 29410.557, 'text': "So it's going to take all of these and convert these into floats and.", 'start': 29407.394, 'duration': 3.163}, {'end': 29414.239, 'text': 'Wherever it encounters a string, it will show through an error.', 'start': 29411.298, 'duration': 2.941}, {'end': 29424.461, 'text': 'But what we want to do is we want to ignore the errors and simply replace any non-numeric values with the NAN value or the empty placeholder value.', 'start': 29414.699, 'duration': 9.762}], 'summary': 'Using pd.to_numeric to convert series into numeric data and handle non-numeric values with nan.', 'duration': 35.109, 'max_score': 29389.352, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29389352.jpg'}, {'end': 29535.9, 'src': 'embed', 'start': 29507.96, 'weight': 1, 'content': [{'end': 29514.047, 'text': 'Um, there are the mean, the average age is about 30 of the survey respondents, but the average age of first coding.', 'start': 29507.96, 'duration': 6.087}, {'end': 29516.89, 'text': 'So the age at which they wrote their first line of code is around 15.', 'start': 29514.067, 'duration': 2.823}, {'end': 29525.535, 'text': 'Uh, the years of coding is about 12 in on average and so on work weeks are around 40.', 'start': 29516.89, 'duration': 8.645}, {'end': 29527.296, 'text': 'But you will start to notice some problems here.', 'start': 29525.535, 'duration': 1.761}, {'end': 29535.9, 'text': 'It seems like that the minimum age mentioned is one which seems quite unlikely that a one year old infant has filled out the survey,', 'start': 29527.996, 'duration': 7.904}], 'summary': "Survey respondents' average age is 30, first coding age around 15, 12 years of coding on average, and 40 work weeks.", 'duration': 27.94, 'max_score': 29507.96, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29507960.jpg'}, {'end': 29781.561, 'src': 'embed', 'start': 29753.796, 'weight': 7, 'content': [{'end': 29758.078, 'text': 'So here we have man, woman, non-binary gender, queer or gender non-conforming.', 'start': 29753.796, 'duration': 4.282}, {'end': 29762.52, 'text': 'So these are the three options, but there are cases where people have picked multiple options too.', 'start': 29758.098, 'duration': 4.422}, {'end': 29770.483, 'text': 'So for instance, there are 121 people have picked man and non-binary gender, queer or gender non-conforming.', 'start': 29763, 'duration': 7.483}, {'end': 29773.164, 'text': 'Now, while this is.', 'start': 29771.663, 'duration': 1.501}, {'end': 29778.718, 'text': 'Acceptable in general, it is going to make our analysis a little bit difficult.', 'start': 29774.235, 'duration': 4.483}, {'end': 29781.561, 'text': "So we're just going to do a small simplification here.", 'start': 29779.119, 'duration': 2.442}], 'summary': '121 people picked multiple gender options, making analysis difficult.', 'duration': 27.765, 'max_score': 29753.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29753796.jpg'}, {'end': 30013.148, 'src': 'embed', 'start': 29979.404, 'weight': 9, 'content': [{'end': 29980.204, 'text': 'So moving forward.', 'start': 29979.404, 'duration': 0.8}, {'end': 29985.951, 'text': 'Now, before we can ask any interesting questions about the survey responses.', 'start': 29981.229, 'duration': 4.722}, {'end': 29994.156, 'text': 'it would be helpful for us to understand what the demographics, which is, things like the country, age, gender, anything that you can use to.', 'start': 29985.951, 'duration': 8.205}, {'end': 30001.784, 'text': 'pick out groups from the responses, any such information, what the demographics of the respondents look like.', 'start': 29995.202, 'duration': 6.582}, {'end': 30013.148, 'text': "And it's important to explore these variables mainly in order to understand how representative the survey is of the worldwide programming community and of the worldwide population in general.", 'start': 30002.344, 'duration': 10.804}], 'summary': 'Understanding survey demographics is crucial to assess representativeness of worldwide programming community and population.', 'duration': 33.744, 'max_score': 29979.404, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29979404.jpg'}, {'end': 30094.859, 'src': 'embed', 'start': 30067.142, 'weight': 6, 'content': [{'end': 30070.446, 'text': 'And also, like the language of the survey, the kind of questions that were asked,', 'start': 30067.142, 'duration': 3.304}, {'end': 30075.092, 'text': 'the length of the survey all of these things make a big difference in terms of who has actually filled the survey.', 'start': 30070.446, 'duration': 4.646}, {'end': 30084.624, 'text': 'And all this is called selection bias, where the respondents of a survey do not come from a randomly picked sample of the overall.', 'start': 30075.453, 'duration': 9.171}, {'end': 30089.696, 'text': 'of the overall population that you want to study.', 'start': 30085.774, 'duration': 3.922}, {'end': 30091.637, 'text': 'So do keep that in mind.', 'start': 30090.277, 'duration': 1.36}, {'end': 30094.859, 'text': "And that's why it's important to first look at the demographics.", 'start': 30092.138, 'duration': 2.721}], 'summary': 'Survey design impacts selection bias in responses.', 'duration': 27.717, 'max_score': 30067.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI30067142.jpg'}, {'end': 30200.85, 'src': 'embed', 'start': 30173.712, 'weight': 2, 'content': [{'end': 30182.375, 'text': "That's quite, it's quite wide, but it might be better to look at what the distribution of responses across countries looks like.", 'start': 30173.712, 'duration': 8.663}, {'end': 30186.976, 'text': 'Now we cannot plot the entire distribution for 183 countries.', 'start': 30183.015, 'duration': 3.961}, {'end': 30193.738, 'text': "So maybe what we'll do is we will simply look at 15, the top 15 countries from where we had the maximum responses.", 'start': 30187.176, 'duration': 6.562}, {'end': 30200.85, 'text': 'And the way to do that is to take the survey, DF dot country, the column, and then use dot value counts on it.', 'start': 30194.318, 'duration': 6.532}], 'summary': 'Analyze top 15 countries with maximum survey responses.', 'duration': 27.138, 'max_score': 30173.712, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI30173712.jpg'}], 'start': 29068.392, 'title': 'Survey data analysis', 'summary': 'Covers the process of selecting and analyzing a subset of columns from a survey data frame containing 65,000 responses to 60 questions, focusing on demographics, programming skills, and employment information. it also addresses data cleaning, pre-processing, bias in survey responses, and includes exploratory analysis and visualization of survey data.', 'chapters': [{'end': 29410.557, 'start': 29068.392, 'title': 'Data analysis with survey data', 'summary': 'Covers the process of selecting and analyzing a subset of columns from a survey data frame containing 65,000 responses to 60 questions, focusing on demographics, programming skills, and employment information, while addressing data type issues and converting non-numeric data into numeric data types.', 'duration': 342.165, 'highlights': ['The survey data frame contains about 65,000 responses to 60 questions, and the analysis is limited to demographics, programming skills, and employment information.', 'The process involves selecting a subset of 20 columns from the survey data frame to focus on specific aspects such as demographics, programming experience, and employment.', 'Issues with data types are identified, with the need to convert non-numeric data into numeric data types, such as using the PD.to_numeric function.']}, {'end': 29778.718, 'start': 29411.298, 'title': 'Data cleaning and analysis', 'summary': 'Covers data cleaning and analysis of a survey dataset, including replacing non-numeric values, handling unrealistic data, and dealing with multi-option gender responses.', 'duration': 367.42, 'highlights': ['The average age of the survey respondents is about 30, with an average age of first coding around 15 and an average of 12 years of coding experience.', 'Identifying unrealistic data, such as the minimum age being one and the maximum being 279, and addressing potential errors in survey responses.', 'Removing rows with age values higher than 100 or lower than 10, and handling work week hours by removing rows with values higher than 140.', 'Dealing with multi-option gender responses and the challenges it poses for analysis.']}, {'end': 30089.696, 'start': 29779.119, 'title': 'Data pre-processing and bias in survey responses', 'summary': 'Focuses on simplifying data by replacing multiple or all selected values with empty values, cleaning and preparing the dataset for analysis, and addressing the potential bias in survey responses due to non-random selection and outreach processes.', 'duration': 310.577, 'highlights': ['The process involves simplifying the data by replacing multiple or all selected values with empty values to aid in analysis.', 'Addressing the potential bias in survey responses due to non-random selection and outreach processes is essential to understand the representativeness of the survey.', 'Exploring variables such as country, age, and gender is important to understand the demographics of the respondents and the potential bias in the survey.']}, {'end': 30340.568, 'start': 30090.277, 'title': 'Survey data analysis and visualization', 'summary': 'Covers exploratory analysis and visualization of survey data, including the number of countries with responses, the top 15 countries with the highest responses, and creating a bar chart to visualize the distribution of responses.', 'duration': 250.291, 'highlights': ['Respondents from 183 countries have answered questions.', 'The top 15 countries with the highest number of responses include United States, India, and UK.', 'A bar chart was created to visualize the distribution of responses across the top 15 countries.']}], 'duration': 1272.176, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI29068392.jpg', 'highlights': ['The survey data frame contains about 65,000 responses to 60 questions, focusing on demographics, programming skills, and employment information.', 'The average age of the survey respondents is about 30, with an average age of first coding around 15 and an average of 12 years of coding experience.', 'Respondents from 183 countries have answered questions.', 'The process involves selecting a subset of 20 columns from the survey data frame to focus on specific aspects such as demographics, programming experience, and employment.', 'The top 15 countries with the highest number of responses include United States, India, and UK.', 'Issues with data types are identified, with the need to convert non-numeric data into numeric data types, such as using the PD.to_numeric function.', 'Addressing the potential bias in survey responses due to non-random selection and outreach processes is essential to understand the representativeness of the survey.', 'Dealing with multi-option gender responses and the challenges it poses for analysis.', 'A bar chart was created to visualize the distribution of responses across the top 15 countries.', 'Exploring variables such as country, age, and gender is important to understand the demographics of the respondents and the potential bias in the survey.']}, {'end': 32940.316, 'segs': [{'end': 30391.031, 'src': 'embed', 'start': 30364.551, 'weight': 4, 'content': [{'end': 30371.096, 'text': 'And that already tells you that probably this survey is not really representative of programmers around the world.', 'start': 30364.551, 'duration': 6.545}, {'end': 30375.499, 'text': 'Maybe no, a large 12, 000 plus 8, 000 plus 4, 000.', 'start': 30371.756, 'duration': 3.743}, {'end': 30379.922, 'text': "So that's about 24, 000 out of 65.", 'start': 30375.499, 'duration': 4.423}, {'end': 30385.266, 'text': "So that's about more than one third, more than 40% of the responses have come from these three or four countries.", 'start': 30379.922, 'duration': 5.344}, {'end': 30391.031, 'text': 'And you might know if you think about this a little bit, it makes sense because one.', 'start': 30385.967, 'duration': 5.064}], 'summary': 'Survey not representative; 24,000 out of 65,000 responses from 3-4 countries.', 'duration': 26.48, 'max_score': 30364.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI30364551.jpg'}, {'end': 31050.391, 'src': 'embed', 'start': 31023.421, 'weight': 0, 'content': [{'end': 31027.665, 'text': "So the question is which of the following describes the highest level of formal education you've completed.", 'start': 31023.421, 'duration': 4.244}, {'end': 31037.74, 'text': "And now you can see here that it seems like out of the 65, 000 respondents, About 25, 000 more over 25, 000 hold a bachelor's degree.", 'start': 31028.286, 'duration': 9.454}, {'end': 31043.125, 'text': "And then, um, another close to 12, 000 hold a master's degree.", 'start': 31038.621, 'duration': 4.504}, {'end': 31048.509, 'text': 'And then there are a few more which hold a doctoral degree, probably about a 1500 or so.', 'start': 31043.805, 'duration': 4.704}, {'end': 31050.391, 'text': 'So all of these three combined.', 'start': 31049.13, 'duration': 1.261}], 'summary': "Out of 65,000 respondents, 25,000 hold a bachelor's, 12,000 hold a master's, and approximately 1500 hold a doctoral degree.", 'duration': 26.97, 'max_score': 31023.421, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI31023421.jpg'}, {'end': 31558.245, 'src': 'embed', 'start': 31534.589, 'weight': 1, 'content': [{'end': 31541.493, 'text': 'So 70% people are employed full-time among the respondents, but there are a fair number of students as well.', 'start': 31534.589, 'duration': 6.904}, {'end': 31542.514, 'text': 'So there are about 12% of students.', 'start': 31541.513, 'duration': 1.001}, {'end': 31550.798, 'text': 'What you might want to do is you might want to break down and then there are people who are not employed, but are looking for work.', 'start': 31544.932, 'duration': 5.866}, {'end': 31556.904, 'text': 'And then there are people who are freelancers, part-timers, and then the people who are just maybe their hobbies.', 'start': 31551.238, 'duration': 5.666}, {'end': 31558.245, 'text': "They're not really looking for work.", 'start': 31556.964, 'duration': 1.281}], 'summary': '70% are employed full-time, 12% are students, and various other employment statuses exist.', 'duration': 23.656, 'max_score': 31534.589, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI31534589.jpg'}, {'end': 31951.111, 'src': 'embed', 'start': 31925.18, 'weight': 2, 'content': [{'end': 31930.002, 'text': 'So we, now we get a column wise sum, and then we can sort those values in a descending order.', 'start': 31925.18, 'duration': 4.822}, {'end': 31932.383, 'text': "So that's going to give us these dev type totals.", 'start': 31930.402, 'duration': 1.981}, {'end': 31937.305, 'text': 'So now you see that we have developer backend developer, full stack developer, front end, and so on.', 'start': 31932.903, 'duration': 4.402}, {'end': 31939.786, 'text': 'So those seem to be the most common rules,', 'start': 31937.525, 'duration': 2.261}, {'end': 31947.229, 'text': "and it's not surprising that the stack overflow is primarily a tool used by developers and professional developers for.", 'start': 31939.786, 'duration': 7.443}, {'end': 31951.111, 'text': 'finding answers to small questions on writing code.', 'start': 31948.129, 'duration': 2.982}], 'summary': 'Column-wise sum and sorting reveal common developer roles on stack overflow.', 'duration': 25.931, 'max_score': 31925.18, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI31925180.jpg'}, {'end': 32339.55, 'src': 'embed', 'start': 32311.972, 'weight': 3, 'content': [{'end': 32315.235, 'text': "And let's visualize this once again, using a horizontal bar chart.", 'start': 32311.972, 'duration': 3.263}, {'end': 32319.145, 'text': 'So now it seems like the language is used in the past year.', 'start': 32316.604, 'duration': 2.541}, {'end': 32323.186, 'text': 'Once again, JavaScript was the most popular language followed by HTML CSS.', 'start': 32319.345, 'duration': 3.841}, {'end': 32328.167, 'text': 'And this is no surprise because today a lot of software has moved onto the web.', 'start': 32323.346, 'duration': 4.821}, {'end': 32331.148, 'text': 'Like you probably spend most of your time in the browser.', 'start': 32328.447, 'duration': 2.701}, {'end': 32334.989, 'text': "Even the Jupiter notebook platform that we're using is actually running in the browser.", 'start': 32331.568, 'duration': 3.421}, {'end': 32339.55, 'text': 'And the only way to write code in the browser is one, you have to write HTML CSS.', 'start': 32335.429, 'duration': 4.121}], 'summary': 'Javascript was the most popular language followed by html css in the past year.', 'duration': 27.578, 'max_score': 32311.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI32311972.jpg'}], 'start': 30341.786, 'title': 'Survey analysis and programming insights', 'summary': 'Discusses the survey analysis, revealing country distribution, age range, programming community insights, educational backgrounds, employment types, roles, popular programming languages in 2020, and visualizations for insights.', 'chapters': [{'end': 30636.369, 'start': 30341.786, 'title': 'Survey analysis: country distribution and age range', 'summary': "Discusses the disproportionate number of responses from united states, india, and united kingdom, representing more than 40% of the total responses, and the distribution of respondents' age, with the bulk of responses in the 20-45 years range.", 'duration': 294.583, 'highlights': ['The survey is not representative of programmers around the world as more than 40% of the responses came from United States, India, and United Kingdom, and the survey was conducted only in English.', 'The distribution of age shows that the majority of responses are in the 20-45 years range, reflecting the professional lifespan of a programmer, but also includes responses from individuals up to 80 years of age.']}, {'end': 30928.974, 'start': 30636.869, 'title': 'Analyzing programming community survey data', 'summary': 'Discusses analyzing survey data to identify representation of age groups and genders in the programming community, revealing a significant skew towards men and a lack of diversity, and proposes exploring education levels and salaries across genders as part of an exploratory data analysis project.', 'duration': 292.105, 'highlights': ['The analysis reveals a significant skew towards men in the programming community, with around 92% of the respondents being men, while only about 8% are women or non-binary.', 'The overall percentage of women in the programming community is estimated to be about 12%, indicating a lack of diversity and representation in the programming community.', 'The chapter proposes an exploratory data analysis project to compare survey responses and preferences across genders, including exploring education levels and salaries to identify gender disparities.']}, {'end': 31490.969, 'start': 30928.974, 'title': 'Programming education insights', 'summary': "Explores the educational backgrounds of programmers, revealing that over half of the 65,000 respondents hold a bachelor's or master's degree, with over 60% having studied computer science or related fields, and emphasizes the importance of formal education while highlighting the diverse pathways into programming careers.", 'duration': 561.995, 'highlights': ["Over half of the 65,000 respondents hold a bachelor's or master's degree.", 'Over 60% of respondents studied computer science or related fields.', 'Close to 40% of programmers holding a college degree have a field of study other than computer science, highlighting diverse educational backgrounds.']}, {'end': 31806.938, 'start': 31492.028, 'title': 'Data analysis: employment and roles', 'summary': 'Focuses on analyzing employment types and roles of respondents, revealing that 70% are employed full-time, 12% are students, and at least 10% are working independently. it also addresses the challenge of processing multiple role selections and proposes a method to split the data into multiple columns.', 'duration': 314.91, 'highlights': ['70% of respondents are employed full-time, while 12% are students.', 'At least 10% of the employed respondents are working independently as contractors or freelancers.', 'Challenge of analyzing multiple role selections and the proposal to split the data into multiple columns.']}, {'end': 32046.391, 'start': 31806.938, 'title': 'Data analysis and visualization', 'summary': 'Discusses splitting a column into a data frame, identifying the most common job roles, and suggests exploring the dataset for insights, with a focus on data science roles and gender diversity.', 'duration': 239.453, 'highlights': ['The function split multi-column is used to split a series into a data frame, resulting in 23 columns representing job roles with true or false values.', 'The most common job roles are identified by counting the number of true values in each column and sorting the sums in descending order.', 'Encouragement to explore and visualize the dataset for further insights, particularly focusing on data science roles and gender diversity.']}, {'end': 32290.338, 'start': 32046.551, 'title': 'Popular programming languages in 2020', 'summary': 'Explores the most popular programming languages in 2020 by analyzing a survey conducted in february 2020, and utilizes data frame operations to identify the percentages and plot them as a bar chart.', 'duration': 243.787, 'highlights': ['The survey was conducted in February 2020, and the data represents 2019 data.', '25 languages were presented to the respondents, and the data frame has 25 columns for each language.', 'The analysis uses data frame operations to calculate the percentages of respondents who have selected specific programming languages and plots them as a bar chart.']}, {'end': 32940.316, 'start': 32291.434, 'title': 'Programming language analysis', 'summary': "Discusses the popularity of programming languages based on usage percentages, visualization through bar charts, and highlights javascript, html, css, sql, and python as the most popular languages, emphasizing javascript's dominance in web development and python's widespread use in non-web-related development and data science.", 'duration': 648.882, 'highlights': ['JavaScript is the most popular language followed by HTML, CSS, and SQL.', 'Python is the most popular language in non-web-related development and data science, surpassing Java.', 'Python is the most sought-after language for learning, followed by JavaScript.', 'Rust emerges as one of the most loved languages, gaining popularity despite a smaller user base.']}], 'duration': 2598.53, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI30341786.jpg', 'highlights': ["Over half of the 65,000 respondents hold a bachelor's or master's degree.", '70% of respondents are employed full-time, while 12% are students.', 'The most common job roles are identified by counting the number of true values in each column and sorting the sums in descending order.', 'JavaScript is the most popular language followed by HTML, CSS, and SQL.', 'The survey is not representative of programmers around the world as more than 40% of the responses came from United States, India, and United Kingdom, and the survey was conducted only in English.']}, {'end': 34279.203, 'segs': [{'end': 33006.845, 'src': 'embed', 'start': 32940.856, 'weight': 0, 'content': [{'end': 32942.396, 'text': 'And I hope you feel the same way.', 'start': 32940.856, 'duration': 1.54}, {'end': 32945.458, 'text': "Um, so that's about the most loved languages.", 'start': 32943.377, 'duration': 2.081}, {'end': 32947.159, 'text': 'We now have some insights about that.', 'start': 32945.517, 'duration': 1.642}, {'end': 32950.1, 'text': 'The next few exercise.', 'start': 32948.239, 'duration': 1.861}, {'end': 32954.743, 'text': 'a simple exercise that you can try here is to identify the most dreaded languages, which is,', 'start': 32950.1, 'duration': 4.643}, {'end': 32960.324, 'text': 'languages which people have used in the past year but do not want to learn or use over the next year.', 'start': 32954.743, 'duration': 5.581}, {'end': 32962.426, 'text': "There's a small hint here.", 'start': 32961.446, 'duration': 0.98}, {'end': 32967.689, 'text': 'All what you can do is you can simply invert the languages interested call a data frame.', 'start': 32962.586, 'duration': 5.103}, {'end': 32972.131, 'text': 'So if you, and the way to invert it is using this tilde operator.', 'start': 32968.148, 'duration': 3.983}, {'end': 32975.999, 'text': 'So just inward that data frame and then do the same thing that we did here.', 'start': 32973.117, 'duration': 2.882}, {'end': 32984.144, 'text': "So you should be able to answer the most dreaded language and then see if your results you'll get the same result as what the stack overflow results present.", 'start': 32976.439, 'duration': 7.705}, {'end': 32985.686, 'text': 'So you can always refer to them.', 'start': 32984.464, 'duration': 1.222}, {'end': 32994.35, 'text': 'So moving further along next question here is in which countries do developers work the highest number of hours per week? Okay.', 'start': 32986.847, 'duration': 7.503}, {'end': 33002.92, 'text': 'Now to do this question, we will, you need to use the group by a function of the group by function of a data frame, the group by method.', 'start': 32994.711, 'duration': 8.209}, {'end': 33006.845, 'text': "And there's a small caveat here.", 'start': 33004.161, 'duration': 2.684}], 'summary': 'Analyzing most dreaded languages and developer work hours per week.', 'duration': 65.989, 'max_score': 32940.856, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI32940856.jpg'}, {'end': 33263.917, 'src': 'embed', 'start': 33234.949, 'weight': 1, 'content': [{'end': 33240.232, 'text': 'So now we have the high response countries and these are the 15 countries with the highest number of working hours.', 'start': 33234.949, 'duration': 5.283}, {'end': 33246.595, 'text': 'It seems, some South Asian countries, some Asian countries like Iran, Israel and China, have the highest working hours,', 'start': 33240.451, 'duration': 6.144}, {'end': 33248.737, 'text': 'which is followed by United States.', 'start': 33246.595, 'duration': 2.142}, {'end': 33251.398, 'text': "So that's interesting.", 'start': 33249.917, 'duration': 1.481}, {'end': 33252.4, 'text': 'And then we have Greece.', 'start': 33251.559, 'duration': 0.841}, {'end': 33254.921, 'text': 'So people are probably working a lot.', 'start': 33253.36, 'duration': 1.561}, {'end': 33256.362, 'text': 'Programmers are working a lot in Greece.', 'start': 33254.961, 'duration': 1.401}, {'end': 33258.724, 'text': 'And once again, we, we see a bunch of Asian countries.', 'start': 33256.462, 'duration': 2.262}, {'end': 33263.917, 'text': 'All the way till actually a major majority of these seem to be Asian countries.', 'start': 33259.754, 'duration': 4.163}], 'summary': 'The top 15 countries with highest working hours include south asian and asian countries such as iran, israel, china, and the united states, with a majority being asian countries.', 'duration': 28.968, 'max_score': 33234.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI33234949.jpg'}, {'end': 33781.438, 'src': 'embed', 'start': 33754.879, 'weight': 7, 'content': [{'end': 33761.124, 'text': 'And maybe go through the matplotlib gallery, go through the Seaborn gallery and try and pick which might be an interesting graph to draw there.', 'start': 33754.879, 'duration': 6.245}, {'end': 33763.825, 'text': 'So these are all different exercises for you to try.', 'start': 33761.544, 'duration': 2.281}, {'end': 33776.735, 'text': "Data analysis by itself is a there's a lot of depth in the field and you can probably spend at least a few months just exploring different ways to slice and dice and analyze and visualize the data.", 'start': 33763.845, 'duration': 12.89}, {'end': 33778.516, 'text': 'So please do that.', 'start': 33777.775, 'duration': 0.741}, {'end': 33781.438, 'text': 'Okay The best way to learn is by doing.', 'start': 33779.456, 'duration': 1.982}], 'summary': 'Explore matplotlib and seaborn galleries for data visualization exercises, emphasizing learning by doing.', 'duration': 26.559, 'max_score': 33754.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI33754879.jpg'}, {'end': 33822.139, 'src': 'embed', 'start': 33798.142, 'weight': 4, 'content': [{'end': 33804.505, 'text': 'Although it definitely has fewer responses from programmers in non-English speaking countries and from women and non-binary genders.', 'start': 33798.142, 'duration': 6.363}, {'end': 33812.271, 'text': 'We have also learned that the programming community is probably not as diverse as it can be in terms of gender, in terms of age,', 'start': 33805.505, 'duration': 6.766}, {'end': 33817.355, 'text': 'maybe in terms of the different languages or the different countries that are there.', 'start': 33812.271, 'duration': 5.084}, {'end': 33822.139, 'text': 'So we should probably take more efforts and support and encourage members of underrepresented communities.', 'start': 33817.375, 'duration': 4.764}], 'summary': 'Programming community lacks diversity in gender and language representation.', 'duration': 23.997, 'max_score': 33798.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI33798142.jpg'}, {'end': 33885.7, 'src': 'embed', 'start': 33856.394, 'weight': 5, 'content': [{'end': 33862.422, 'text': 'And this can be a great way for you to break into the field, not just in programming, but also in data science, which are very closely related fields.', 'start': 33856.394, 'duration': 6.028}, {'end': 33868.087, 'text': 'We learned that JavaScript and HTML are the most popular programming languages used in 2020.', 'start': 33863.444, 'duration': 4.643}, {'end': 33871.49, 'text': 'And then we learned that Python is the language and most people are interested in learning.', 'start': 33868.087, 'duration': 3.403}, {'end': 33877.974, 'text': "And we've learned that Rust and TypeScript are the most loved languages, both of which are small, but fast growing communities.", 'start': 33871.97, 'duration': 6.004}, {'end': 33885.7, 'text': 'And finally, it seems like programmers around the world seem to be working on 40 hours on average, but there are slight variations by country.', 'start': 33878.715, 'duration': 6.985}], 'summary': 'In 2020, javascript and html were the most popular programming languages, python was the most sought after, and rust and typescript were the most loved. programmers work an average of 40 hours per week globally.', 'duration': 29.306, 'max_score': 33856.394, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI33856394.jpg'}, {'end': 33916.42, 'src': 'embed', 'start': 33889.122, 'weight': 6, 'content': [{'end': 33895.026, 'text': "You can learn and start programming professionally at any age, and you're likely to have a long and fulfilling career.", 'start': 33889.122, 'duration': 5.904}, {'end': 33900.41, 'text': "If you also enjoy programming as a hobby, especially it's going to help you during the first few years.", 'start': 33895.086, 'duration': 5.324}, {'end': 33901.751, 'text': 'All right.', 'start': 33901.47, 'duration': 0.281}, {'end': 33904.232, 'text': "So that's our analysis.", 'start': 33902.171, 'duration': 2.061}, {'end': 33909.876, 'text': "And as I said, there's a wealth of information to be discovered and we've like barely scratched the surface.", 'start': 33904.733, 'duration': 5.143}, {'end': 33912.138, 'text': 'So there are a few more ideas that I wanted to share with you.', 'start': 33909.896, 'duration': 2.242}, {'end': 33916.42, 'text': 'You can repeat the analysis for different age groups and genders and compare the results.', 'start': 33912.598, 'duration': 3.822}], 'summary': 'Learning programming at any age leads to a fulfilling career. hobbyists benefit in the early years. further analysis can be done by age and gender.', 'duration': 27.298, 'max_score': 33889.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI33889122.jpg'}, {'end': 33992.925, 'src': 'embed', 'start': 33970.489, 'weight': 10, 'content': [{'end': 33981.017, 'text': "You will see that there is definitely a pay gap and you can try and validate that and try to compare the results of this year's survey with the previous year and identify some interesting trends,", 'start': 33970.489, 'duration': 10.528}, {'end': 33983.379, 'text': 'because this is data that you get every year.', 'start': 33981.017, 'duration': 2.362}, {'end': 33992.925, 'text': 'And once again, you can go back to this link stackoverflow.com insights.stackoverflow.com and you can download the raw data for every year.', 'start': 33984.72, 'duration': 8.205}], 'summary': 'Pay gap evident in survey data, track trends using stackoverflow.com insights', 'duration': 22.436, 'max_score': 33970.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI33970489.jpg'}, {'end': 34053.927, 'src': 'embed', 'start': 34030.854, 'weight': 8, 'content': [{'end': 34040.068, 'text': "So that's a real sign that you've done something significant in data analysis, and you can proudly then showcase that on your professional profile.", 'start': 34030.854, 'duration': 9.214}, {'end': 34044.34, 'text': 'Then we have this, I just want to share a few references.', 'start': 34041.238, 'duration': 3.102}, {'end': 34047.462, 'text': "Now we've used pandas, matplotlib and seaborn.", 'start': 34044.48, 'duration': 2.982}, {'end': 34050.444, 'text': "So you can refer to the previous lectures we've linked.", 'start': 34047.522, 'duration': 2.922}, {'end': 34053.927, 'text': 'You just go to zero to pandas.com and you can find the lectures there.', 'start': 34050.764, 'duration': 3.163}], 'summary': 'Data analysis milestone achieved with pandas, matplotlib, and seaborn. access lectures at zero to pandas.com', 'duration': 23.073, 'max_score': 34030.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34030854.jpg'}, {'end': 34098.369, 'src': 'embed', 'start': 34071.658, 'weight': 9, 'content': [{'end': 34078.3, 'text': 'And finally, as I told you, we are creating this open datasets library Python package,', 'start': 34071.658, 'duration': 6.642}, {'end': 34081.721, 'text': 'which is a curated collection of datasets for data analysis and machine learning.', 'start': 34078.3, 'duration': 3.421}, {'end': 34087.963, 'text': 'So far we have about six, seven datasets, but we will, we are planning to add about a hundred datasets here.', 'start': 34082.281, 'duration': 5.682}, {'end': 34092.804, 'text': "So over the next few days, and we've released this library just yesterday.", 'start': 34088.403, 'duration': 4.401}, {'end': 34098.369, 'text': "And it's something that we worked up quickly to make sure that it's easy for you to download these data sets.", 'start': 34093.604, 'duration': 4.765}], 'summary': 'Creating open datasets library python package with 6-7 datasets, planning to add 100 more.', 'duration': 26.711, 'max_score': 34071.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34071658.jpg'}, {'end': 34169.24, 'src': 'embed', 'start': 34144.593, 'weight': 11, 'content': [{'end': 34151.655, 'text': "And then you ask and answer interesting questions about the data and an optional, but highly recommended step because you've put in so much work.", 'start': 34144.593, 'duration': 7.062}, {'end': 34158.316, 'text': 'And if you can just consolidate all of your learning into a blog post to showcase your work, that is something that you can do as well.', 'start': 34151.795, 'duration': 6.521}, {'end': 34161.637, 'text': 'So I just want to give you a quick overview of the course project.', 'start': 34158.996, 'duration': 2.641}, {'end': 34165.779, 'text': 'And then we have a few exciting things to close out.', 'start': 34162.777, 'duration': 3.002}, {'end': 34169.24, 'text': 'So now the course project, this is a starter notebook.', 'start': 34166.819, 'duration': 2.421}], 'summary': 'Course project involves asking questions about data and showcasing work in a blog post.', 'duration': 24.647, 'max_score': 34144.593, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34144593.jpg'}], 'start': 32940.856, 'title': 'Analyzing programming languages and work week hours', 'summary': 'Delves into identifying the most dreaded programming languages using data inversion and comparing the results with stack overflow data, and demonstrates using pandas group by function to find the average work week hours by country, considering only countries with more than 250 responses, and then filtering the top 15 countries with the highest average work week hours. it also explores the average working hours of programmers, the potential impact of age on starting a career in programming, and emphasizes the importance of data analysis in identifying trends, comparisons, and insights from diverse datasets.', 'chapters': [{'end': 32984.144, 'start': 32940.856, 'title': 'Most dreaded programming languages', 'summary': 'Explores the concept of identifying the most dreaded programming languages by utilizing data inversion and comparing the results with stack overflow data.', 'duration': 43.288, 'highlights': ['By inverting the languages interested data frame using the tilde operator, one can identify the most dreaded programming languages.', 'The exercise involves identifying languages that people have used in the past year but do not want to learn or use over the next year.', 'The chapter provides a hint on how to approach the exercise and compares the results with Stack Overflow data.']}, {'end': 33233.828, 'start': 32984.464, 'title': 'Average work week hours by country', 'summary': 'Demonstrates using pandas group by function to find the average work week hours by country, considering only countries with more than 250 responses, and then filtering the top 15 countries with the highest average work week hours.', 'duration': 249.364, 'highlights': ['Using group by function to find average work week hours by country', 'Filtering top 15 countries with the highest average work week hours', 'Encouraging step-by-step learning and understanding']}, {'end': 33909.876, 'start': 33234.949, 'title': 'Average working hours and career in programming', 'summary': 'Explores the average working hours of programmers, the potential impact of age on starting a career in programming, and concludes that the programming community lacks diversity, with insights into popular programming languages, working patterns, and the potential for a fulfilling career regardless of age.', 'duration': 674.927, 'highlights': ['The programming community lacks diversity, particularly in terms of gender, age, and country representation, and there is a need to support and encourage members of underrepresented communities.', 'JavaScript and HTML are the most popular programming languages used in 2020, with Python being the most sought-after language for learning, while Rust and TypeScript are the most loved languages, reflecting the growing communities around these languages.', 'Programmers around the world seem to work an average of 40 hours per week, with slight variations by country, and there is potential for a long and fulfilling career in programming regardless of age, with the enjoyment of programming as a hobby aiding in the initial years of professional programming.']}, {'end': 34279.203, 'start': 33909.896, 'title': 'Course project and data analysis', 'summary': 'Emphasizes the importance of data analysis in identifying trends, comparisons, and insights from diverse datasets, while also highlighting the significance of replicating survey results and showcasing the use of pandas, matplotlib, and seaborn in the analysis process.', 'duration': 369.307, 'highlights': ['The chapter emphasizes the importance of data analysis in identifying trends, comparisons, and insights from diverse datasets.', 'The significance of replicating survey results and showcasing the use of pandas, matplotlib, and seaborn in the analysis process.', 'The introduction of the open datasets library Python package and its potential for course projects.']}], 'duration': 1338.347, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI32940856.jpg', 'highlights': ['Using group by function to find average work week hours by country', 'Filtering top 15 countries with the highest average work week hours', 'By inverting the languages interested data frame using the tilde operator, one can identify the most dreaded programming languages', 'The exercise involves identifying languages that people have used in the past year but do not want to learn or use over the next year', 'The programming community lacks diversity, particularly in terms of gender, age, and country representation, and there is a need to support and encourage members of underrepresented communities', 'JavaScript and HTML are the most popular programming languages used in 2020, with Python being the most sought-after language for learning, while Rust and TypeScript are the most loved languages, reflecting the growing communities around these languages', 'Programmers around the world seem to work an average of 40 hours per week, with slight variations by country, and there is potential for a long and fulfilling career in programming regardless of age, with the enjoyment of programming as a hobby aiding in the initial years of professional programming', 'The chapter emphasizes the importance of data analysis in identifying trends, comparisons, and insights from diverse datasets', 'The significance of replicating survey results and showcasing the use of pandas, matplotlib, and seaborn in the analysis process', 'The introduction of the open datasets library Python package and its potential for course projects', 'The chapter provides a hint on how to approach the exercise and compares the results with Stack Overflow data', 'Encouraging step-by-step learning and understanding']}, {'end': 35781.512, 'segs': [{'end': 34354.777, 'src': 'embed', 'start': 34323.43, 'weight': 3, 'content': [{'end': 34326.012, 'text': 'So here we now run conda activate course project.', 'start': 34323.43, 'duration': 2.582}, {'end': 34329.355, 'text': 'So now the environment has been activated inside this environment.', 'start': 34326.132, 'duration': 3.223}, {'end': 34331.858, 'text': 'We might want to install all the libraries that we want to use.', 'start': 34329.395, 'duration': 2.463}, {'end': 34333.419, 'text': "We're going to use Jovian.", 'start': 34332.418, 'duration': 1.001}, {'end': 34334.82, 'text': "We're going to use Jupiter.", 'start': 34333.699, 'duration': 1.121}, {'end': 34341.146, 'text': "Let's say we'll use open data sets or you don't have to, but you might, we are going to use pandas.", 'start': 34334.94, 'duration': 6.206}, {'end': 34345.37, 'text': "We are going to use numpy and we are going to use CBON and we're going to use matplotlib.", 'start': 34341.186, 'duration': 4.184}, {'end': 34349.233, 'text': 'So we just install all the libraries after activating the environment.', 'start': 34345.93, 'duration': 3.303}, {'end': 34354.777, 'text': "And once these libraries are installed, let's just give that a second.", 'start': 34350.514, 'duration': 4.263}], 'summary': 'Activate environment, install libraries: jovian, jupiter, pandas, numpy, cbon, matplotlib.', 'duration': 31.347, 'max_score': 34323.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34323430.jpg'}, {'end': 34774.299, 'src': 'embed', 'start': 34745.64, 'weight': 1, 'content': [{'end': 34749.902, 'text': 'So we will evaluate that your dataset contains at least three columns and 150 rows of data.', 'start': 34745.64, 'duration': 4.262}, {'end': 34753.704, 'text': 'We will, you must ask an answer at least five questions about the dataset.', 'start': 34750.163, 'duration': 3.541}, {'end': 34757.206, 'text': 'Your submission should also include at least five visualizations.', 'start': 34754.184, 'duration': 3.022}, {'end': 34764.752, 'text': 'And your submission should include explanations using Markdown cells apart from just code, right? So just code is not good enough.', 'start': 34757.706, 'duration': 7.046}, {'end': 34766.093, 'text': 'Please write explanations.', 'start': 34764.852, 'duration': 1.241}, {'end': 34770.736, 'text': 'And that helps you understand your data that helps you gather insights from your data.', 'start': 34766.553, 'duration': 4.183}, {'end': 34774.299, 'text': 'And it is also going to help others like tomorrow.', 'start': 34771.117, 'duration': 3.182}], 'summary': 'Evaluate dataset: 3 columns, 150 rows. 5 questions, 5 visualizations, explanations using markdown.', 'duration': 28.659, 'max_score': 34745.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34745640.jpg'}, {'end': 34917.581, 'src': 'embed', 'start': 34890.453, 'weight': 2, 'content': [{'end': 34895.475, 'text': 'So do share your work on the forum and browse through projects by other participants and maybe give feedback.', 'start': 34890.453, 'duration': 5.022}, {'end': 34897.475, 'text': 'And that will also be a great way for you to learn.', 'start': 34895.615, 'duration': 1.86}, {'end': 34904.137, 'text': 'When you see other people creating visualizations, similar or different visualizations, you will get to learn from their code.', 'start': 34897.935, 'duration': 6.202}, {'end': 34906.798, 'text': 'You will get to learn from their analysis and so on.', 'start': 34904.157, 'duration': 2.641}, {'end': 34907.478, 'text': 'So please do that.', 'start': 34906.838, 'duration': 0.64}, {'end': 34914.04, 'text': 'And one highly recommended step is to write a blog post and a blog post is a great way to present and showcase your work.', 'start': 34908.158, 'duration': 5.882}, {'end': 34917.581, 'text': 'So you can sign up on medium.com to write a blog post for your project.', 'start': 34914.4, 'duration': 3.181}], 'summary': "Engage with the forum, learn from others' visualizations, and write a blog post to showcase your project.", 'duration': 27.128, 'max_score': 34890.453, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34890453.jpg'}, {'end': 35393.108, 'src': 'embed', 'start': 35366.394, 'weight': 0, 'content': [{'end': 35374.136, 'text': 'So it will be a page on joven.ml, a part of your profile where it can be displayed, and this will be available for download as a PDF as well.', 'start': 35366.394, 'duration': 7.742}, {'end': 35377.057, 'text': 'So if you want to download it, print it out, you can do that too.', 'start': 35374.236, 'duration': 2.821}, {'end': 35381.96, 'text': 'And you will be able to add it on LinkedIn onto your LinkedIn profile,', 'start': 35377.757, 'duration': 4.203}, {'end': 35388.005, 'text': 'so that anybody who visits your profile will see that you have completed a certification on data analysis with Python.', 'start': 35381.96, 'duration': 6.045}, {'end': 35393.108, 'text': 'And you will also be able to share it online on Twitter, Facebook, or wherever.', 'start': 35388.065, 'duration': 5.043}], 'summary': 'Create a page on joven.ml for displaying and downloading a pdf of a data analysis certification, shareable on linkedin, twitter, and facebook.', 'duration': 26.714, 'max_score': 35366.394, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI35366394.jpg'}], 'start': 34279.204, 'title': 'Maximizing project impact and learning', 'summary': 'Emphasizes sharing project work, engaging with the community, and recapping course content, while also discussing earning a data analysis certification and enhancing professional profile with jovian.ml.', 'chapters': [{'end': 34467.593, 'start': 34279.204, 'title': 'Setting up anaconda environment for python libraries', 'summary': 'Demonstrates the process of creating and activating a conda environment, installing essential libraries like jovian, jupiter, pandas, numpy, seaborn, and matplotlib, and launching a jupyter notebook, providing guidance on the steps required for the setup and highlighting the benefits of using binder.', 'duration': 188.389, 'highlights': ['The chapter provides instructions for creating and activating a conda environment, installing essential libraries like Jovian, Jupiter, pandas, numpy, seaborn, and matplotlib, and launching a Jupyter notebook, streamlining the setup process for data analysis projects.', 'Recommendation of using binder for a one-click experience in setting up the environment and installing libraries, emphasizing its convenience compared to the manual setup process, which might take a longer time.', 'Guidance on the steps required for setting up the environment, including installing Anaconda, cloning the notebook using the Jovian clone command, creating and activating a conda environment, installing libraries using pip install, and launching a Jupyter notebook, providing a comprehensive overview of the setup process.']}, {'end': 34845.978, 'start': 34468.792, 'title': 'Data analysis project guidelines', 'summary': 'Explains the process of selecting a real-world dataset, performing data preparation, exploratory analysis and visualization, asking and answering questions about the data, writing conclusions, and making a submission, emphasizing the importance of explanations and avoiding plagiarism, with evaluation criteria including a minimum of three columns and 150 rows of data, five questions and visualizations, and explanations using markdown cells.', 'duration': 377.186, 'highlights': ['The chapter explains the process of selecting a real-world dataset, performing data preparation, exploratory analysis and visualization, asking and answering questions about the data, writing conclusions, and making a submission.', 'The importance of explanations and avoiding plagiarism is emphasized, with evaluation criteria including a minimum of three columns and 150 rows of data, five questions and visualizations, and explanations using Markdown cells.', 'The chapter stresses the significance of presenting the analysis well, making interesting observations, and avoiding plagiarism.']}, {'end': 35350.389, 'start': 34846.118, 'title': 'Maximizing project impact and learning', 'summary': 'Emphasizes the importance of sharing project work on social media and the forum, writing a blog post, and engaging with the course community for feedback and learning, while also providing a quick recap of the course content and encouraging active participation in the forum to complete the course.', 'duration': 504.271, 'highlights': ['The chapter emphasizes the importance of sharing project work on social media and the forum, writing a blog post, and engaging with the course community for feedback and learning.', 'Provides a quick recap of the course content, including the topics covered such as Python programming, NumPy, pandas, visualization with Matplotlib and Seaborn, and exploratory data analysis.', 'Encourages active participation in the forum for asking questions and getting help, highlighting the contributions of the course community in answering questions and providing support.']}, {'end': 35781.512, 'start': 35351.029, 'title': 'Data analysis certification and professional profile building', 'summary': "Discusses the process of earning a certification in data analysis with python, emphasizing the significance of showcasing a substantial project and enhancing one's professional profile, including utilizing jovian.ml for hosting and sharing work.", 'duration': 430.483, 'highlights': ["Earning a certification in data analysis with Python requires completing all assignments and the course project, obtaining a pass grade, and results in a verified certificate hosted on joven.ml, which can be displayed on one's profile, downloaded as a PDF, and shared on LinkedIn, Twitter, and Facebook.", 'Emphasizing the need for a substantial data analysis project and the importance of documenting, presenting, and adding it to public profiles to improve professional profile.', 'Encouraging the creation of blog posts, tutorials, and guides, and sharing resources with the community, such as classmates or colleagues, to enhance knowledge and professional profile.', 'Utilizing Jovian.ml to build a professional data science profile, including adding current designation, university or company, linking GitHub profile, and creating collections of notebooks, as well as engaging on the forum to showcase expertise.', 'Encouraging the avoidance of jumping to another course without producing real output and highlighting the significance of learning by doing good projects in data science and machine learning.']}], 'duration': 1502.308, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GPVsHOlRBBI/pics/GPVsHOlRBBI34279204.jpg', 'highlights': ["Earning a certification in data analysis with Python requires completing all assignments and the course project, obtaining a pass grade, and results in a verified certificate hosted on joven.ml, which can be displayed on one's profile, downloaded as a PDF, and shared on LinkedIn, Twitter, and Facebook.", 'The chapter explains the process of selecting a real-world dataset, performing data preparation, exploratory analysis and visualization, asking and answering questions about the data, writing conclusions, and making a submission.', 'The chapter emphasizes the importance of sharing project work on social media and the forum, writing a blog post, and engaging with the course community for feedback and learning.', 'The chapter provides instructions for creating and activating a conda environment, installing essential libraries like Jovian, Jupiter, pandas, numpy, seaborn, and matplotlib, and launching a Jupyter notebook, streamlining the setup process for data analysis projects.']}], 'highlights': ['The course offers a practical beginner-friendly introduction to data analysis with Python and provides a verified certificate of accomplishment upon completion.', 'Covers jupyter notebook basics, including tips for beginners and markdown formatting', 'The while loop in Python allows running a set of statements multiple times based on a given condition, exemplified by calculating the factorial of a number, such as 100, using a while loop, showcasing the efficiency of computation with minimal code lines and time taken (2 milliseconds).', 'The EMI for the 10-year loan is smaller than that for the 8-year loan, as expected.', 'Importing built-in modules from the Python standard library, such as the math module, provides access to a variety of math-related operations.', 'Approximate yield of 56.8 tons per hectare for Kanto region', 'Performing matrix multiplication to predict yields for 10,000 data rows using a given set of weights (0.3 0.2 0.5).', 'The np.genfromtxt function is used to load the climate data file, which contains about 10,000 data points.', 'The chapter covers various methods for creating NumPy arrays, including NP.dot.array, NP.zeros, NP.ones, NP.random, NP.arange, reshaping arrays, and NP.linspace.', "The function 'loan EMI' is defined to calculate the equal monthly installment for the repayment of a loan based on the loan amount, duration, rate of interest, and down payment.", 'The dataset contains information about 150 flowers and includes measurements of SEPL length, SEPL width, petal length, petal width, and the species of the flower.', 'The chapter covers techniques for creating various types of plots, including scatter plots, line graphs, histograms, bar plots, heat maps, and images using matplotlib and seaborn.', 'The final case study on exploratory data analysis, involving real-world datasets, will be analyzed and conclusions drawn, to be showcased as a professional profile.', 'The survey data frame contains about 65,000 responses to 60 questions, focusing on demographics, programming skills, and employment information.', "Over half of the 65,000 respondents hold a bachelor's or master's degree.", "Earning a certification in data analysis with Python requires completing all assignments and the course project, obtaining a pass grade, and results in a verified certificate hosted on joven.ml, which can be displayed on one's profile, downloaded as a PDF, and shared on LinkedIn, Twitter, and Facebook."]}