title
Air Quality Index Prediction- Data Collection Part 1

description
Hello Guys this is the first video of the project. Please join as a member 799rs program to access the remaining which I will be uploading from today Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more https://www.youtube.com/channel/UCNU_lfiiWBdtULKOw6X0Dig/join github url : https://github.com/krishnaik06/AQI-Project Connect with me here: Twitter: https://twitter.com/Krishnaik06 Facebook: https://www.facebook.com/krishnaik06 instagram: https://www.instagram.com/krishnaik06

detail
{'title': 'Air Quality Index Prediction- Data Collection Part 1', 'heatmap': [{'end': 599.877, 'start': 559.323, 'weight': 0.828}, {'end': 1020.719, 'start': 994.992, 'weight': 0.704}, {'end': 1640.318, 'start': 1571.267, 'weight': 1}], 'summary': 'Provides an overview of a live project on air quality index, emphasizing comprehensive data science project lifecycle coverage and practical coding examples, including completing one project every month, and discusses a supervised machine learning project to predict air quality index, requiring 10 to 15 days for completion. it also demonstrates web scraping techniques to collect air quality index and weather data for bangalore, india from 2013 to 2018, emphasizing the importance of code customization for data retrieval, dynamic url generation, and storing html data with minimal code.', 'chapters': [{'end': 226.034, 'segs': [{'end': 169.429, 'src': 'embed', 'start': 25.364, 'weight': 0, 'content': [{'end': 28.845, 'text': "we'll discuss the whole life cycle of a data science project.", 'start': 25.364, 'duration': 3.481}, {'end': 36.228, 'text': "so, as usual, the first step, we will be going with data collection and we will be using techniques like we'll be using web scrapping,", 'start': 28.845, 'duration': 7.383}, {'end': 45.211, 'text': "we'll be using third-party apis to collect the data and after that, in the second step, what we'll do is that we'll pre-process this particular data.", 'start': 36.228, 'duration': 8.983}, {'end': 49.733, 'text': 'and remember guys, this data collection for air quality index.', 'start': 46.151, 'duration': 3.582}, {'end': 53.215, 'text': 'right, we are trying to solve the project air quality index.', 'start': 49.733, 'duration': 3.482}, {'end': 59.559, 'text': 'so data collection will be from multiple resources, from multiple sources.', 'start': 53.215, 'duration': 6.344}, {'end': 61.68, 'text': "we'll be collecting this and every.", 'start': 59.559, 'duration': 2.121}, {'end': 68.424, 'text': "i'll try to write each and every line of code so that you can also practice and you can understand about this.", 'start': 61.68, 'duration': 6.744}, {'end': 74.168, 'text': "so in the second step, we'll be pre-processing the data and over here we'll be doing various tasks like teacher engineering.", 'start': 68.424, 'duration': 5.744}, {'end': 81.051, 'text': 'okay, and remember guys, today, in this particular video on the day one, we are basically just going to perform this.', 'start': 74.808, 'duration': 6.243}, {'end': 87.233, 'text': 'data collection and data collection will also be divided into two days, so that you also get time to practice it.', 'start': 81.051, 'duration': 6.182}, {'end': 88.753, 'text': 'and please do change.', 'start': 87.233, 'duration': 1.52}, {'end': 90.874, 'text': 'please do practice this particular thing, guys.', 'start': 88.753, 'duration': 2.121}, {'end': 95.056, 'text': 'it is very, very important, and you can also write this particular project into your resume.', 'start': 90.874, 'duration': 4.182}, {'end': 101.877, 'text': "so we'll try to complete each and every life cycle of a data science project, that is, after pre-processing the data, after doing feature engineering,", 'start': 95.516, 'duration': 6.361}, {'end': 105.138, 'text': "we'll do feature selection, then we'll do models creation.", 'start': 101.877, 'duration': 3.261}, {'end': 113.54, 'text': "we'll try to use both machine learning and deep learning models by creating neural networks, and after that we'll do hyper parameter optimization.", 'start': 105.138, 'duration': 8.402}, {'end': 118.661, 'text': "we'll try to use techniques like auto ml, and then we will also be deploying this model.", 'start': 113.54, 'duration': 5.121}, {'end': 126.924, 'text': 'So, as I promised you that for this membership plan, what we have to do is that every month we try to complete one project.', 'start': 119.561, 'duration': 7.363}, {'end': 132.246, 'text': 'So usually the time that is required for completing this one project may be around 10 to 15 days.', 'start': 127.324, 'duration': 4.922}, {'end': 139.168, 'text': "I'll try to complete it within that so that you also get hands on experience onto that and you also practice a lot of things into it.", 'start': 132.386, 'duration': 6.782}, {'end': 140.869, 'text': 'so let us go ahead.', 'start': 139.648, 'duration': 1.221}, {'end': 143.271, 'text': 'and the first thing is that where do we collect the data?', 'start': 140.869, 'duration': 2.402}, {'end': 145.032, 'text': 'this is pretty much important.', 'start': 143.271, 'duration': 1.761}, {'end': 146.213, 'text': 'where do we collect the data?', 'start': 145.032, 'duration': 1.181}, {'end': 148.374, 'text': 'that is the most important thing now.', 'start': 146.213, 'duration': 2.161}, {'end': 149.675, 'text': 'collecting the data, guys,', 'start': 148.374, 'duration': 1.301}, {'end': 158.402, 'text': 'i tried to find a lot of third-party apis and i was not just able to get all the parameters that were required for air quality index,', 'start': 149.675, 'duration': 8.727}, {'end': 159.682, 'text': 'to predict the air quality index.', 'start': 158.402, 'duration': 1.28}, {'end': 169.429, 'text': 'So, but I found out one very good website where you can actually perform web scrapping and you can get some data from that, but the output label,', 'start': 160.823, 'duration': 8.606}], 'summary': 'The data science project focuses on air quality index, utilizing web scraping, third-party apis, and various data processing techniques, with a goal of completing one project per month within 10-15 days.', 'duration': 144.065, 'max_score': 25.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA25364.jpg'}, {'end': 226.034, 'src': 'embed', 'start': 184.739, 'weight': 2, 'content': [{'end': 194.017, 'text': "So I'm going to introduce you to a website which is called as this particular website, and again the link, the code,", 'start': 184.739, 'duration': 9.278}, {'end': 196.499, 'text': 'everything will be given in the github profile.', 'start': 194.017, 'duration': 2.482}, {'end': 199.52, 'text': 'the link will be given in the description box of this particular video.', 'start': 196.499, 'duration': 3.021}, {'end': 206.344, 'text': "so to begin with, i'm going to use this particular website and this particular website gives us the weather forecast for every 15 days.", 'start': 199.52, 'duration': 6.824}, {'end': 213.648, 'text': "and again, i'll just show you how you can basically web scrap by just writing a 20 to 30 lines of code,", 'start': 206.344, 'duration': 7.304}, {'end': 215.589, 'text': 'and we also try to use object oriented features.', 'start': 213.648, 'duration': 1.941}, {'end': 218.07, 'text': "i'll try to create main program over there.", 'start': 215.589, 'duration': 2.481}, {'end': 222.712, 'text': "i'll try to create various you know functions to extract the data.", 'start': 218.07, 'duration': 4.642}, {'end': 226.034, 'text': 'so make sure that you practice each and every line of code and you also do it.', 'start': 222.712, 'duration': 3.322}], 'summary': 'Introduction to website for 15-day weather forecast, demonstrating web scraping and object-oriented programming.', 'duration': 41.295, 'max_score': 184.739, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA184739.jpg'}], 'start': 1.658, 'title': 'Air quality and data science projects', 'summary': 'Provides an overview of a live project on air quality index, emphasizing comprehensive data science project lifecycle coverage and practical coding examples, including completing one project every month, and discussing a supervised machine learning project to predict air quality index, requiring 10 to 15 days for completion.', 'chapters': [{'end': 68.424, 'start': 1.658, 'title': 'Air quality index project overview', 'summary': 'Provides an overview of a live project on air quality index, focusing on data collection techniques such as web scraping and third-party apis, emphasizing the comprehensive coverage of the data science project life cycle, and encouraging audience engagement through practical coding examples.', 'duration': 66.766, 'highlights': ["The project covers the life cycle of a data science project and emphasizes data collection techniques like web scraping and third-party APIs, ensuring a comprehensive understanding of the project's scope and process.", 'The data collection process for the air quality index project involves collecting data from multiple resources and sources, providing a diverse and comprehensive dataset for analysis and modeling.', 'The presenter emphasizes practical coding examples, ensuring audience engagement and understanding through the provision of detailed code explanations and opportunities for practice.']}, {'end': 126.924, 'start': 68.424, 'title': 'Data science project lifecycle', 'summary': 'Covers the data science project lifecycle, including data pre-processing, feature engineering, model creation using machine learning and deep learning, hyper parameter optimization, and model deployment, with an emphasis on completing one project every month.', 'duration': 58.5, 'highlights': ['The chapter emphasizes completing one data science project every month for the membership plan.', 'Various tasks like feature engineering, data collection, and model creation are discussed in the chapter.', 'Importance of practicing and implementing the project in resume is highlighted for the viewers.']}, {'end': 226.034, 'start': 127.324, 'title': 'Air quality prediction project', 'summary': 'Discusses a supervised machine learning project to predict air quality index, requiring 10 to 15 days for completion, and the use of web scraping to collect weather forecast data for implementation.', 'duration': 98.71, 'highlights': ['The project completion time is estimated to be around 10 to 15 days, aiming to provide hands-on experience and practice for the participants. The project requires 10 to 15 days for completion, offering hands-on experience and practice.', 'The difficulty in finding all the necessary parameters for air quality index led to the use of web scraping from a specific website to gather required data. Difficulty in obtaining all parameters for air quality index led to the use of web scraping from a specific website.', 'Introduction of a website providing weather forecast for every 15 days, with detailed guidance on web scraping using 20 to 30 lines of code and object-oriented features. Introduction of a website offering weather forecast and guidance on web scraping using 20 to 30 lines of code and object-oriented features.']}], 'duration': 224.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1658.jpg', 'highlights': ["The project covers the life cycle of a data science project and emphasizes data collection techniques like web scraping and third-party APIs, ensuring a comprehensive understanding of the project's scope and process.", 'The data collection process for the air quality index project involves collecting data from multiple resources and sources, providing a diverse and comprehensive dataset for analysis and modeling.', 'The presenter emphasizes practical coding examples, ensuring audience engagement and understanding through the provision of detailed code explanations and opportunities for practice.', 'The chapter emphasizes completing one data science project every month for the membership plan.', 'Various tasks like feature engineering, data collection, and model creation are discussed in the chapter.', 'Importance of practicing and implementing the project in resume is highlighted for the viewers.', 'The project completion time is estimated to be around 10 to 15 days, aiming to provide hands-on experience and practice for the participants.', 'The difficulty in finding all the necessary parameters for air quality index led to the use of web scraping from a specific website to gather required data.', 'Introduction of a website providing weather forecast for every 15 days, with detailed guidance on web scraping using 20 to 30 lines of code and object-oriented features.']}, {'end': 408.727, 'segs': [{'end': 278.476, 'src': 'embed', 'start': 248.664, 'weight': 0, 'content': [{'end': 254.026, 'text': "So, guys, so we'll be using this particular website and we'll try to collect the data.", 'start': 248.664, 'duration': 5.362}, {'end': 257.547, 'text': 'In order to collect the data, what you have to do is that first of all, go and click on climate.', 'start': 254.126, 'duration': 3.421}, {'end': 262.067, 'text': "Since we are actually trying to find out the air quality index, so we'll go and click on the climate.", 'start': 258.027, 'duration': 4.04}, {'end': 266.73, 'text': "After clicking the climate, I'm from India, so I'm going to select Asia.", 'start': 262.608, 'duration': 4.122}, {'end': 273.152, 'text': "I'm going to just select some of the states or some of the cities over here and try to find out the air quality index.", 'start': 267.25, 'duration': 5.902}, {'end': 278.476, 'text': "so, after i select asia, what i'll do is that i will go and select india,", 'start': 273.752, 'duration': 4.724}], 'summary': 'Using a website to collect air quality data from selected cities in india and asia.', 'duration': 29.812, 'max_score': 248.664, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA248664.jpg'}, {'end': 361.14, 'src': 'embed', 'start': 301.276, 'weight': 2, 'content': [{'end': 310.28, 'text': "what i'll do is that from 2013 till 2018, i'll try to collect the data now, if i go and see in 2013, for each and every month.", 'start': 301.276, 'duration': 9.004}, {'end': 315.963, 'text': 'what a wonderful website this is, because for each and every month, you have all the information right you.', 'start': 310.28, 'duration': 5.683}, {'end': 317.603, 'text': 'if you go and select january.', 'start': 315.963, 'duration': 1.64}, {'end': 324.206, 'text': 'okay, now, observe over here, as soon as i selected bangalore, this code got automatically appended.', 'start': 317.603, 'duration': 6.603}, {'end': 328.408, 'text': 'that is w32 950 and this is basically my year.', 'start': 324.206, 'duration': 4.202}, {'end': 332.469, 'text': 'okay, so 2013 ws, and this is basically my code.', 'start': 328.408, 'duration': 4.061}, {'end': 335.01, 'text': 'if i select some other city, this code will change.', 'start': 332.469, 'duration': 2.541}, {'end': 336.25, 'text': 'okay, this code will change.', 'start': 335.01, 'duration': 1.24}, {'end': 341.913, 'text': "so you have to understand this thing, Because when you're using web scrapping, this is the most important point to note, right?", 'start': 336.25, 'duration': 5.663}, {'end': 346.794, 'text': "Because in the web scrapping, I'm going to write a generic code wherein I'll just change this year.", 'start': 342.293, 'duration': 4.501}, {'end': 351.296, 'text': "I'll change this code and I'll be able to download the data of different, different cities.", 'start': 346.794, 'duration': 4.502}, {'end': 353.617, 'text': "But right now I'm actually trying to see for Bangalore.", 'start': 351.356, 'duration': 2.261}, {'end': 361.14, 'text': 'now. suppose, if I go and select January now, you just see this if I select January, what will happen is that this particular date will get changed.', 'start': 354.077, 'duration': 7.063}], 'summary': 'Collecting data for bangalore from 2013-2018, web scraping to download monthly information for different cities based on code changes.', 'duration': 59.864, 'max_score': 301.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA301276.jpg'}], 'start': 226.034, 'title': 'Collecting air quality and weather data', 'summary': 'Focuses on using a website to collect air quality index data for bangalore, india from 2013 to 2018, emphasizing the climate section and city selection. additionally, it demonstrates web scraping techniques to extract weather data for different cities and months, highlighting the importance of code customization for data retrieval.', 'chapters': [{'end': 324.206, 'start': 226.034, 'title': 'Collecting air quality data in bangalore', 'summary': 'Focuses on using a website to collect air quality index data for bangalore, india from 2013 to 2018, with specific emphasis on the climate section and city selection.', 'duration': 98.172, 'highlights': ['The chapter focuses on using a website to collect air quality index data for Bangalore, India. The main focus of the chapter is on collecting air quality index data for Bangalore, India.', 'Emphasizes the process of selecting the climate section and city for data collection. The process of selecting the climate section and choosing the city, particularly Bangalore, for data collection is emphasized.', 'Data collection is targeted from 2013 to 2018, specifically for every month. The data collection effort targets the period from 2013 to 2018, with a focus on gathering data for every month within this timeframe.']}, {'end': 408.727, 'start': 324.206, 'title': 'Web scraping for weather data', 'summary': 'Demonstrates web scraping techniques to extract weather data for different cities and months, emphasizing the importance of understanding code customization for data retrieval.', 'duration': 84.521, 'highlights': ['The chapter demonstrates web scraping techniques to extract weather data for different cities and months The speaker explains how the code can be modified to download data for different cities and months, showcasing the flexibility of the web scraping process.', 'Emphasizes the importance of understanding code customization for data retrieval The speaker stresses the significance of understanding code customization when using web scraping for data retrieval, highlighting the critical aspect of modifying code for specific data extraction needs.', 'Explains the significance of code modification for different months and cities The speaker emphasizes the importance of modifying the code to cater to different months and cities, showcasing its adaptability for extracting diverse weather data.']}], 'duration': 182.693, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA226034.jpg', 'highlights': ['The chapter focuses on using a website to collect air quality index data for Bangalore, India.', 'Emphasizes the process of selecting the climate section and city for data collection.', 'Data collection is targeted from 2013 to 2018, specifically for every month.', 'The chapter demonstrates web scraping techniques to extract weather data for different cities and months.', 'Emphasizes the importance of understanding code customization for data retrieval.', 'Explains the significance of code modification for different months and cities.']}, {'end': 633.039, 'segs': [{'end': 505.563, 'src': 'embed', 'start': 449.717, 'weight': 0, 'content': [{'end': 459.639, 'text': 'that is what we have to do for scrapping the data and this all information is actually required to do the air quality index prediction right.', 'start': 449.717, 'duration': 9.922}, {'end': 463.401, 'text': "so for this, what i'm going to do is that i'm going to write a code for you all,", 'start': 459.639, 'duration': 3.762}, {'end': 467.903, 'text': "and this particular code that is what i'm writing will be completely from scratch.", 'start': 463.401, 'duration': 4.502}, {'end': 473.226, 'text': "okay, so we'll try to write this particular code completely from scratch with the help of spider id.", 'start': 467.903, 'duration': 5.323}, {'end': 478.809, 'text': "i'll write each and every line of code how i will be changing these values, everything i'll be actually showing.", 'start': 473.226, 'duration': 5.583}, {'end': 483.751, 'text': 'so let us go ahead and let us try to do and see how we can basically fetch this particular details.', 'start': 478.809, 'duration': 4.942}, {'end': 488.361, 'text': "Now, in this data collection part one, what I'm going to do is that I'm going to download.", 'start': 484.212, 'duration': 4.149}, {'end': 495.94, 'text': 'So suppose if I make it as 03, right? If I make it as 03 and enter, so this is a different HTML page.', 'start': 488.381, 'duration': 7.559}, {'end': 499.541, 'text': "Okay So first of all, I'm going to download all the HTML page.", 'start': 496.32, 'duration': 3.221}, {'end': 505.563, 'text': 'Okay With respect to all the years, with respect to, you know, all the months of that particular year.', 'start': 499.921, 'duration': 5.642}], 'summary': 'Scrape data to predict air quality index using code written from scratch with spider id for data collection.', 'duration': 55.846, 'max_score': 449.717, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA449717.jpg'}, {'end': 553.101, 'src': 'embed', 'start': 525.51, 'weight': 2, 'content': [{'end': 531.913, 'text': 'so let us go ahead and try to write a very good code where you will try to pick up the data from 2013 to 2018.', 'start': 525.51, 'duration': 6.403}, {'end': 535.375, 'text': 'all this particular information that is required and these are all my independent features.', 'start': 531.913, 'duration': 3.462}, {'end': 540.018, 'text': 'guys, these are all my independent features, and this leads to the change in air quality index.', 'start': 535.375, 'duration': 4.643}, {'end': 542.419, 'text': 'okay, so this is pretty much important to understand.', 'start': 540.018, 'duration': 2.401}, {'end': 553.101, 'text': 'so let us go ahead, and i will just go and open my spider id and make sure that, guys, uh, you are working in spider python environment 3.7 or 3.6.', 'start': 542.779, 'duration': 10.322}], 'summary': 'Code to gather data from 2013-2018 for air quality index.', 'duration': 27.591, 'max_score': 525.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA525510.jpg'}, {'end': 599.877, 'src': 'heatmap', 'start': 559.323, 'weight': 0.828, 'content': [{'end': 563.524, 'text': "before that, what i'm going to do is that i'm going to import some libraries.", 'start': 559.323, 'duration': 4.201}, {'end': 571.065, 'text': "so the first library i'm going to import is something like import os import time.", 'start': 563.524, 'duration': 7.541}, {'end': 577.087, 'text': 'so, because I want to see that, how much time the execution will basically take place to download all the HTML files?', 'start': 571.065, 'duration': 6.022}, {'end': 582.489, 'text': "but before that, this what I'll do is that I'll just make a folder over here.", 'start': 577.087, 'duration': 5.402}, {'end': 588.551, 'text': "I'll just make a folder saying that all my data, all my data, should be actually stored over here.", 'start': 582.489, 'duration': 6.062}, {'end': 590.351, 'text': 'okay, all my data should be actually.', 'start': 588.551, 'duration': 1.8}, {'end': 593.352, 'text': 'so whatever data I am actually collecting, that will be stored over here.', 'start': 590.351, 'duration': 3.001}, {'end': 597.996, 'text': 'the second, uh, the second folder that i can create.', 'start': 593.732, 'duration': 4.264}, {'end': 599.877, 'text': "uh, i, i don't have to create anything.", 'start': 597.996, 'duration': 1.881}], 'summary': 'Imported libraries to measure download time and created folders for data storage.', 'duration': 40.554, 'max_score': 559.323, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA559323.jpg'}, {'end': 588.551, 'src': 'embed', 'start': 563.524, 'weight': 3, 'content': [{'end': 571.065, 'text': "so the first library i'm going to import is something like import os import time.", 'start': 563.524, 'duration': 7.541}, {'end': 577.087, 'text': 'so, because I want to see that, how much time the execution will basically take place to download all the HTML files?', 'start': 571.065, 'duration': 6.022}, {'end': 582.489, 'text': "but before that, this what I'll do is that I'll just make a folder over here.", 'start': 577.087, 'duration': 5.402}, {'end': 588.551, 'text': "I'll just make a folder saying that all my data, all my data, should be actually stored over here.", 'start': 582.489, 'duration': 6.062}], 'summary': 'Importing os and time libraries to measure download time and creating a folder for data storage.', 'duration': 25.027, 'max_score': 563.524, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA563524.jpg'}], 'start': 408.727, 'title': 'Web scrapping for air quality index prediction', 'summary': 'Covers web scraping techniques for air quality index prediction, including the use of a generic code with a for loop to collect data for each month and year, and the plan to write a code from scratch using spider id for data collection and applying beautiful soup for html parsing. it also discusses the process of retrieving and storing data from 2013 to 2018 in python using libraries such as os, time, and requests, emphasizing the importance of understanding the impact on air quality index.', 'chapters': [{'end': 525.51, 'start': 408.727, 'title': 'Web scrapping for air quality index prediction', 'summary': 'Discusses the technique to scrape data from a website for air quality index prediction, including the use of a generic code with a for loop to collect data for each month and year, and the plan to write a code from scratch using spider id for data collection and applying beautiful soup for html parsing.', 'duration': 116.783, 'highlights': ['The technique to scrape data from a website involves using a generic code with a for loop to collect data for each month and year, enabling the prediction of air quality index.', 'The plan to write a code from scratch using Spider ID for data collection and applying Beautiful Soup for HTML parsing.', 'Downloading all the HTML pages with respect to all the years and months of a particular year for data collection.']}, {'end': 633.039, 'start': 525.51, 'title': 'Data retrieval and storage in python', 'summary': 'Discusses the process of retrieving and storing data from 2013 to 2018 in python using libraries such as os, time, and requests, emphasizing the importance of understanding the impact on air quality index.', 'duration': 107.529, 'highlights': ['The chapter discusses the process of retrieving and storing data from 2013 to 2018 in Python. It involves retrieving and storing data from 2013 to 2018.', 'The importance of understanding the impact on air quality index is emphasized. The independent features lead to changes in air quality index, which is crucial to comprehend.', 'Libraries such as os, time, and requests are used for data retrieval and storage. The code utilizes libraries like os, time, and requests for data retrieval and storage.']}], 'duration': 224.312, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA408727.jpg', 'highlights': ['The plan to write a code from scratch using Spider ID for data collection and applying Beautiful Soup for HTML parsing.', 'The technique to scrape data from a website involves using a generic code with a for loop to collect data for each month and year, enabling the prediction of air quality index.', 'The importance of understanding the impact on air quality index is emphasized. The independent features lead to changes in air quality index, which is crucial to comprehend.', 'Libraries such as os, time, and requests are used for data retrieval and storage. The code utilizes libraries like os, time, and requests for data retrieval and storage.', 'The chapter discusses the process of retrieving and storing data from 2013 to 2018 in Python. It involves retrieving and storing data from 2013 to 2018.', 'Downloading all the HTML pages with respect to all the years and months of a particular year for data collection.']}, {'end': 1049.471, 'segs': [{'end': 699.93, 'src': 'embed', 'start': 654.849, 'weight': 0, 'content': [{'end': 656.69, 'text': 'so for that what i first have to do.', 'start': 654.849, 'duration': 1.841}, {'end': 661.312, 'text': "as i told you, i'll be collecting the data between 2013 till 2018.", 'start': 656.69, 'duration': 4.622}, {'end': 664.333, 'text': 'okay, so, all the data, all the information will be collected.', 'start': 661.312, 'duration': 3.021}, {'end': 667.255, 'text': "so for that, what i'll do is that i'll simply write a for loop.", 'start': 664.333, 'duration': 2.922}, {'end': 668.055, 'text': "so i'll write it.", 'start': 667.255, 'duration': 0.8}, {'end': 679.736, 'text': "as for year in range, okay, and i've already told you from 2013 till 2018, okay, so, but i have to take the last index, that is 2019.", 'start': 668.055, 'duration': 11.681}, {'end': 683.438, 'text': "okay, so i'll take all this particular year.", 'start': 679.736, 'duration': 3.702}, {'end': 685.619, 'text': "okay, so, all the years i'm basically taking,", 'start': 683.438, 'duration': 2.181}, {'end': 689.962, 'text': 'because i need to collect the data for each and every year of this particular from that particular website.', 'start': 685.619, 'duration': 4.343}, {'end': 699.93, 'text': 'The next thing that I will do is that I will simply say that I am going to construct a URL, and this URL will be dynamically constructed.', 'start': 690.402, 'duration': 9.528}], 'summary': 'Collecting data from 2013 to 2018 using a for loop to construct dynamically generated urls.', 'duration': 45.081, 'max_score': 654.849, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA654849.jpg'}, {'end': 936.408, 'src': 'embed', 'start': 910.818, 'weight': 4, 'content': [{'end': 915.42, 'text': 'if I keep this 0, it will become 0, 1, 0, and this is how dot format actually works, you know.', 'start': 910.818, 'duration': 4.602}, {'end': 918.301, 'text': "so this is how I'm actually retrieving all the data.", 'start': 915.42, 'duration': 2.881}, {'end': 922.143, 'text': "so how I'm actually setting up dynamically my URL to collect the data.", 'start': 918.301, 'duration': 3.842}, {'end': 925.444, 'text': "okay, and remember, I've used dot format into this now.", 'start': 922.143, 'duration': 3.301}, {'end': 926.704, 'text': 'this is pretty much amazing.', 'start': 925.444, 'duration': 1.26}, {'end': 932.007, 'text': "I'm iterating through all the years that I want the data for and for each and every month I want the data for.", 'start': 926.704, 'duration': 5.303}, {'end': 934.588, 'text': 'pretty much simple, pretty much easy.', 'start': 932.007, 'duration': 2.581}, {'end': 936.408, 'text': 'let us see what is the warning over here.', 'start': 934.588, 'duration': 1.82}], 'summary': 'Using dot format to dynamically retrieve and iterate through data for multiple years and months.', 'duration': 25.59, 'max_score': 910.818, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA910818.jpg'}, {'end': 1020.719, 'src': 'heatmap', 'start': 994.992, 'weight': 5, 'content': [{'end': 1001.936, 'text': 'okay, now, when we retrieve the url, when we retrieve the information, we also have to do an utf encoding.', 'start': 994.992, 'duration': 6.944}, {'end': 1003.336, 'text': 'now, why utf encoding is there?', 'start': 1001.936, 'duration': 1.4}, {'end': 1008.479, 'text': 'because there are some characters that are basically there in the html tag which we need to fix it.', 'start': 1003.336, 'duration': 5.143}, {'end': 1020.719, 'text': "okay, so for that what i'm going to do is that i'll say text underscore, utf, okay, and for this i'm going to use text dot encode.", 'start': 1008.479, 'duration': 12.24}], 'summary': 'Retrieve and encode url information with utf to fix characters in html tags.', 'duration': 25.727, 'max_score': 994.992, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA994992.jpg'}], 'start': 633.039, 'title': 'Data collection and dynamic url generation', 'summary': 'Discusses the process of collecting data for each month and year between 2013 and 2018 using a for loop and dynamically constructing a url, as well as the dynamic generation of a url for collecting data by iterating through months and years, using placeholders and dot format.', 'chapters': [{'end': 767.492, 'start': 633.039, 'title': 'Data collection process and url construction', 'summary': 'Discusses the process of collecting data for each month and year between 2013 and 2018 using a for loop and dynamically constructing a url using string formatting.', 'duration': 134.453, 'highlights': ['The process involves collecting data for each month and year between 2013 and 2018. The speaker mentions the importance of collecting data for each year and month between 2013 and 2018 from a particular website.', 'Dynamically constructing a URL using string formatting. The speaker explains the process of dynamically constructing a URL by using string formatting to include the changing year and month in the URL.', 'Utilizing a for loop to iterate through each year and month for data collection. The speaker outlines the use of a for loop to iterate through each year and month between 2013 and 2018 for data collection.']}, {'end': 1049.471, 'start': 767.492, 'title': 'Dynamic url generation and html retrieval', 'summary': 'Discusses the dynamic generation of a url for collecting data by iterating through months and years, using placeholders and dot format, and then retrieving the html content using the request module with utf encoding for fixing characters in html tags.', 'duration': 281.979, 'highlights': ['The chapter discusses the dynamic generation of a URL for collecting data by iterating through months and years. The code iterates through a range of 1 to 13 to represent the 12 months in a year, dynamically generating a URL to collect data for each month and year.', 'Using placeholders and dot format to dynamically set up the URL for data collection. The speaker demonstrates the use of placeholders and dot format to dynamically replace month and year in the URL, ensuring accurate representation of the desired data.', 'Retrieving the HTML content using the request module and applying utf encoding for fixing characters in HTML tags. The process of retrieving the HTML content from the generated URL is explained, with emphasis on utilizing the request module and applying utf encoding to address character issues within HTML tags.']}], 'duration': 416.432, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA633039.jpg', 'highlights': ['The process involves collecting data for each month and year between 2013 and 2018.', 'The chapter discusses the dynamic generation of a URL for collecting data by iterating through months and years.', 'Utilizing a for loop to iterate through each year and month for data collection.', 'Dynamically constructing a URL using string formatting.', 'Using placeholders and dot format to dynamically set up the URL for data collection.', 'Retrieving the HTML content using the request module and applying utf encoding for fixing characters in HTML tags.']}, {'end': 1494.826, 'segs': [{'end': 1284.666, 'src': 'embed', 'start': 1260.211, 'weight': 3, 'content': [{'end': 1269.803, 'text': "so here then i'm going to make a month folder okay and i'm going to open this particular file in the form of write byte mode.", 'start': 1260.211, 'duration': 9.592}, {'end': 1275.602, 'text': "okay, if you, if you don't write this right byte board, if you just write write mode, uh, you'll get some bytes issue.", 'start': 1270.66, 'duration': 4.942}, {'end': 1279.964, 'text': "that is what i faced while doing it so and i'm going to say it as okay.", 'start': 1275.602, 'duration': 4.362}, {'end': 1284.666, 'text': 'consider this as my output okay,', 'start': 1279.964, 'duration': 4.702}], 'summary': 'Creating a month folder and opening a file in write byte mode to avoid bytes issue.', 'duration': 24.455, 'max_score': 1260.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1260211.jpg'}, {'end': 1338.787, 'src': 'embed', 'start': 1309.877, 'weight': 4, 'content': [{'end': 1314.178, 'text': 'because in 2013, I want all the files like Jan.', 'start': 1309.877, 'duration': 4.301}, {'end': 1317.599, 'text': 'Jan file Feb file HTML files.', 'start': 1314.178, 'duration': 3.421}, {'end': 1317.919, 'text': 'in short.', 'start': 1317.599, 'duration': 0.32}, {'end': 1322.521, 'text': "So I'm basically making a folder structure first of all, all the years folder will get created.", 'start': 1317.94, 'duration': 4.581}, {'end': 1324.262, 'text': "now, after that, what i'm going?", 'start': 1322.981, 'duration': 1.281}, {'end': 1329.764, 'text': "i'm saying that with with open and i can remove this double dot because i have not put inside any folder.", 'start': 1324.262, 'duration': 5.502}, {'end': 1338.787, 'text': "so i can write with open data, html, underscore data right and i'm trying to open the year and the month file in the right mode as output and uh,", 'start': 1329.764, 'duration': 9.023}], 'summary': 'Creating folder structure for yearly files and opening month files in write mode.', 'duration': 28.91, 'max_score': 1309.877, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1309877.jpg'}, {'end': 1377.561, 'src': 'embed', 'start': 1347.09, 'weight': 2, 'content': [{'end': 1350.972, 'text': 'whole text that is utf encoded and this is basically retrieved from all the html.', 'start': 1347.09, 'duration': 3.882}, {'end': 1353.455, 'text': "So, in short, you'll be seeing that.", 'start': 1351.352, 'duration': 2.103}, {'end': 1358.561, 'text': 'finally, okay, one thing I missed inside this is that I have to make this as html.', 'start': 1353.455, 'duration': 5.106}, {'end': 1362.066, 'text': "Because I'm actually creating a .html file.", 'start': 1359.683, 'duration': 2.383}, {'end': 1370.136, 'text': "So this you can see that inside my here, I'm actually going to create whatever month .html is basically getting created.", 'start': 1362.386, 'duration': 7.75}, {'end': 1377.561, 'text': "you'll just see that after I run this particular stuff and if you have not understand this, understood this, just try to execute line by line.", 'start': 1370.436, 'duration': 7.125}], 'summary': 'Creating a .html file to display retrieved data from html, executing line by line.', 'duration': 30.471, 'max_score': 1347.09, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1347090.jpg'}, {'end': 1457.442, 'src': 'embed', 'start': 1416.634, 'weight': 0, 'content': [{'end': 1424.92, 'text': "you are able to actually scrap that whole data and i'll not say scrapping, because here you are not used beautiful soup, but here instead,", 'start': 1416.634, 'duration': 8.286}, {'end': 1439.166, 'text': "you are using this request and you are downloading all everything in the form of html uh files and this is where this html file is getting created and try to execute this whole line and we'll also try to see whether we are getting any errors or not.", 'start': 1424.92, 'duration': 14.246}, {'end': 1442.569, 'text': 'okay, but just understand that we have actually written it pretty much simple.', 'start': 1439.166, 'duration': 3.403}, {'end': 1443.07, 'text': 'first of all,', 'start': 1442.569, 'duration': 0.501}, {'end': 1455.621, 'text': 'between 2013 to 2019 and we are iterating to every month between 1 to 13 we are setting up this particular url and i i told you why we are setting up this dynamic url over here and this data we are collecting only for bangalore.', 'start': 1443.07, 'duration': 12.551}, {'end': 1457.442, 'text': 'then, uh, i have my text.', 'start': 1455.621, 'duration': 1.821}], 'summary': 'Using requests to download html files, collecting data for bangalore from 2013 to 2019 and iterating through each month.', 'duration': 40.808, 'max_score': 1416.634, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1416634.jpg'}], 'start': 1049.471, 'title': 'Storing and retrieving html data', 'summary': 'Covers creating and storing html data in specific folders, creating monthly html files, and retrieving data from a dynamic url using python, aiming to automate processes and collect data for bangalore from 2013 to 2019 with minimal code.', 'chapters': [{'end': 1260.211, 'start': 1049.471, 'title': 'Creating and storing html data', 'summary': "Discusses creating and storing html data within specific folders, checking for their existence, and using open function to access and save data, with the aim to automate the process for each year's data.", 'duration': 210.74, 'highlights': ['The process involves checking if the data folder contains a folder named html_data and a folder for each year, creating them if absent using os.make_directory, and then saving the data inside them.', "The code checks if the specified folders exist using if not os.path.exist and then creates them using os.make_directory, automating the process for each year's data.", 'The open function is used to access the HTML file within the specified folder and then save the data inside it for each year, simplifying the data storage process.']}, {'end': 1370.136, 'start': 1260.211, 'title': 'Creating monthly html files', 'summary': 'Explains how to create a folder structure for html files for each year and month, ensuring data is written in utf format, and files are created with the .html extension.', 'duration': 109.925, 'highlights': ['It details the process of creating a folder structure for HTML files for each year and month, ensuring the creation of .html files. This ensures organized storage of HTML files, facilitating easy retrieval and management.', 'The importance of writing data in UTF format is highlighted to ensure proper encoding and decoding of text, preventing data corruption or display issues.', 'The significance of opening files in write byte mode instead of write mode is explained, emphasizing the avoidance of byte-related issues during file operations.']}, {'end': 1494.826, 'start': 1370.436, 'title': 'Data retrieval and processing', 'summary': 'Explains the process of retrieving data from a dynamic url using python, involving code execution, file creation, and data collection for bangalore from 2013 to 2019, all achieved with just 20 lines of code.', 'duration': 124.39, 'highlights': ['By writing just 20 lines of code, you are able to actually retrieve the whole data, including downloading everything in the form of HTML files using request, and processing it without using beautiful soup.', 'The process involves setting up a dynamic URL to collect data for Bangalore between 2013 to 2019, iterating through every month, creating a directory if not present, and creating HTML files for each month and year.', "The code includes importing the 'sys' module and using 'sys.stdout.flush' to flush everything getting created in the file, and creating a directory if it does not exist, followed by creating an HTML file for each month and year."]}], 'duration': 445.355, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1049471.jpg', 'highlights': ['By writing just 20 lines of code, you are able to actually retrieve the whole data, including downloading everything in the form of HTML files using request, and processing it without using beautiful soup.', 'The process involves setting up a dynamic URL to collect data for Bangalore between 2013 to 2019, iterating through every month, creating a directory if not present, and creating HTML files for each month and year.', 'The importance of writing data in UTF format is highlighted to ensure proper encoding and decoding of text, preventing data corruption or display issues.', 'The significance of opening files in write byte mode instead of write mode is explained, emphasizing the avoidance of byte-related issues during file operations.', 'It details the process of creating a folder structure for HTML files for each year and month, ensuring the creation of .html files. This ensures organized storage of HTML files, facilitating easy retrieval and management.']}, {'end': 1813.964, 'segs': [{'end': 1534.072, 'src': 'embed', 'start': 1507.755, 'weight': 0, 'content': [{'end': 1513.717, 'text': 'And you know why main function is basically used because this is my starting point of my execution in Python programming language.', 'start': 1507.755, 'duration': 5.962}, {'end': 1519.56, 'text': "So I'll say, underscore underscore main, underscore underscore, and I'm going to use a colon.", 'start': 1514.157, 'duration': 5.403}, {'end': 1523.243, 'text': 'Let me just say that start underscore time.', 'start': 1520.4, 'duration': 2.843}, {'end': 1528.567, 'text': "I'm going to also note down the time, how much time it basically takes to get executed.", 'start': 1523.243, 'duration': 5.324}, {'end': 1530.549, 'text': "and this is where I'm executing it.", 'start': 1528.567, 'duration': 1.982}, {'end': 1534.072, 'text': 'and then I have.', 'start': 1530.549, 'duration': 3.523}], 'summary': 'Main function used as starting point in python. tracking execution time.', 'duration': 26.317, 'max_score': 1507.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1507755.jpg'}, {'end': 1640.318, 'src': 'heatmap', 'start': 1571.267, 'weight': 1, 'content': [{'end': 1581.213, 'text': "what i'm going to do is that i'm going to just write dot format And I'm going to subtract stop underscore time, minus.", 'start': 1571.267, 'duration': 9.946}, {'end': 1585.076, 'text': 'start underscore time, because just to find out how much time it is basically taking,', 'start': 1581.213, 'duration': 3.863}, {'end': 1589.903, 'text': 'Okay guys, I have also made one change over here.', 'start': 1587.662, 'duration': 2.241}, {'end': 1599.369, 'text': "I had initially written dot dot slash, but what I'll do is that I'll just remove this dot dot, because you can see that if I go and see over here,", 'start': 1589.923, 'duration': 9.446}, {'end': 1601.23, 'text': 'this is how my folder structure looks like.', 'start': 1599.369, 'duration': 1.861}, {'end': 1604.171, 'text': 'My HTML underscore script dot ty looks like this.', 'start': 1601.71, 'duration': 2.461}, {'end': 1606.813, 'text': "So I'm just going to remove that dot dot dot for everything.", 'start': 1604.331, 'duration': 2.482}, {'end': 1612.647, 'text': "Now what I'm going to do is that I've written this particular code and this particular code.", 'start': 1607.321, 'duration': 5.326}, {'end': 1616.111, 'text': 'you can see that all this particular condition in the inside this particular for loop.', 'start': 1612.647, 'duration': 3.464}, {'end': 1618.134, 'text': 'please make sure that you have written in that particular way.', 'start': 1616.111, 'duration': 2.023}, {'end': 1621.418, 'text': 'So it will be helpful for you otherwise you may get some issues also.', 'start': 1618.674, 'duration': 2.744}, {'end': 1624.421, 'text': "Now what I'm going to do is I'm also going to execute this.", 'start': 1621.838, 'duration': 2.583}, {'end': 1627.104, 'text': 'Now let us go ahead and see whether this code works or not.', 'start': 1624.461, 'duration': 2.643}, {'end': 1629.607, 'text': 'So it will take some amount of time.', 'start': 1628.085, 'duration': 1.522}, {'end': 1632.009, 'text': 'Here you can see that my HTML data is getting created.', 'start': 1629.627, 'duration': 2.382}, {'end': 1634.071, 'text': 'My 2013 data is getting created.', 'start': 1632.049, 'duration': 2.022}, {'end': 1640.318, 'text': 'All the HTML files have been getting created because of this request library that you can see.', 'start': 1634.892, 'duration': 5.426}], 'summary': 'Subtract stop time from start time to find execution time, remove unnecessary code, and execute, resulting in successful creation of html files.', 'duration': 69.051, 'max_score': 1571.267, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1571267.jpg'}, {'end': 1672.195, 'src': 'embed', 'start': 1642.937, 'weight': 3, 'content': [{'end': 1645.399, 'text': 'now my 2013 will be ready.', 'start': 1642.937, 'duration': 2.462}, {'end': 1647.16, 'text': "i'll be having 12 files.", 'start': 1645.399, 'duration': 1.761}, {'end': 1651.963, 'text': 'uh, over here one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve.', 'start': 1647.16, 'duration': 4.803}, {'end': 1654.025, 'text': "now. similarly, it'll take some amount of time.", 'start': 1651.963, 'duration': 2.062}, {'end': 1654.645, 'text': 'you know.', 'start': 1654.025, 'duration': 0.62}, {'end': 1660.249, 'text': 'you can also create some kind of uh visualization stuff where it can show you in the form of arrow.', 'start': 1654.645, 'duration': 5.604}, {'end': 1661.249, 'text': 'now this is my 2000.', 'start': 1660.769, 'duration': 0.48}, {'end': 1664.171, 'text': 'you can see all the 12 months has been retrieved.', 'start': 1661.249, 'duration': 2.922}, {'end': 1665.912, 'text': 'now i can go to my 2014.', 'start': 1664.171, 'duration': 1.741}, {'end': 1667.312, 'text': 'it is getting retrieved.', 'start': 1665.912, 'duration': 1.4}, {'end': 1672.195, 'text': 'similarly, 2015, it is getting retrieved and all the html files are getting retrieved.', 'start': 1667.312, 'duration': 4.883}], 'summary': 'Preparing 12 files for 2013, retrieving data for 2014 and 2015, creating visualization with arrows.', 'duration': 29.258, 'max_score': 1642.937, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1642937.jpg'}, {'end': 1726.395, 'src': 'embed', 'start': 1685.992, 'weight': 2, 'content': [{'end': 1694.297, 'text': 'okay, and remember, guys, my output feature, yet i have to collect it from my other source, that is the air quality index.', 'start': 1685.992, 'duration': 8.305}, {'end': 1695.498, 'text': 'so here are we guys.', 'start': 1694.297, 'duration': 1.201}, {'end': 1698.46, 'text': 'everything has got recorded, everything has got scrapped.', 'start': 1695.498, 'duration': 2.962}, {'end': 1703.57, 'text': 'so here we have all the html pages that we wanted, the data from 2013 to 18..', 'start': 1698.46, 'duration': 5.11}, {'end': 1705.032, 'text': 'and now this html pages.', 'start': 1703.57, 'duration': 1.462}, {'end': 1712.242, 'text': "we have to apply beautiful soup and retrieve all the table information and that for that we'll be using beautiful soup.", 'start': 1705.032, 'duration': 7.21}, {'end': 1719.952, 'text': 'you can see that in that html pages will have all this information right and if i go and inspect my table here,', 'start': 1712.242, 'duration': 7.71}, {'end': 1722.774, 'text': 'you will be able to see If I use this particular class.', 'start': 1719.952, 'duration': 2.822}, {'end': 1726.395, 'text': "with the help of beautiful soup I'll be able to retrieve all this particular table.", 'start': 1722.774, 'duration': 3.621}], 'summary': 'Data from 2013 to 2018 collected and scraped using beautiful soup to retrieve table information.', 'duration': 40.403, 'max_score': 1685.992, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1685992.jpg'}, {'end': 1812.522, 'src': 'embed', 'start': 1745.298, 'weight': 5, 'content': [{'end': 1748.179, 'text': 'You can select any other countries, whatever countries you want.', 'start': 1745.298, 'duration': 2.881}, {'end': 1750.461, 'text': 'just go and select asia.', 'start': 1748.639, 'duration': 1.822}, {'end': 1752.702, 'text': 'suppose in asia you want india.', 'start': 1750.461, 'duration': 2.241}, {'end': 1761.929, 'text': 'go and select india, and if you want some other states or some other cities like bangalore, you can select that.', 'start': 1752.702, 'duration': 9.227}, {'end': 1764.091, 'text': 'you can again use this particular code,', 'start': 1761.929, 'duration': 2.162}, {'end': 1770.156, 'text': 'but make sure that you write this particular condition to do each and everything right and automatically all your data will be collected.', 'start': 1764.091, 'duration': 6.065}, {'end': 1772.297, 'text': 'So this is my current data that has got collected.', 'start': 1770.436, 'duration': 1.861}, {'end': 1774.398, 'text': "Remember, I still don't have air quality index.", 'start': 1772.357, 'duration': 2.041}, {'end': 1779.021, 'text': "I'll be importing that from some third party API and then we'll try to combine.", 'start': 1774.478, 'duration': 4.543}, {'end': 1782.563, 'text': "we'll scrap this data and try to create this particular data set in the tomorrow's session.", 'start': 1779.021, 'duration': 3.542}, {'end': 1784.785, 'text': 'So yes, this was all about this particular session.', 'start': 1783.004, 'duration': 1.781}, {'end': 1785.565, 'text': 'I hope you like it.', 'start': 1784.825, 'duration': 0.74}, {'end': 1793.29, 'text': "Please do subscribe the channel if you're not already subscribed and please do take this as a membership opportunity for my YouTube channel.", 'start': 1786.045, 'duration': 7.245}, {'end': 1795.811, 'text': "I'll still, there are a lot of many things to do.", 'start': 1793.67, 'duration': 2.141}, {'end': 1798.833, 'text': 'This is just the data collection stage, part one of the data collection stage.', 'start': 1795.871, 'duration': 2.962}, {'end': 1803.991, 'text': "and i'll be uploading this particular script in the github so that you will be able to access it.", 'start': 1799.845, 'duration': 4.146}, {'end': 1805.072, 'text': "so that's it all, guys.", 'start': 1803.991, 'duration': 1.081}, {'end': 1810.6, 'text': 'uh um, this was all about this particular video practice with some other, some other cities, and try to do more things.', 'start': 1805.072, 'duration': 5.528}, {'end': 1811.862, 'text': "so yes, i'll see you all in the next video.", 'start': 1810.6, 'duration': 1.262}, {'end': 1812.522, 'text': 'have a great day.', 'start': 1811.862, 'duration': 0.66}], 'summary': 'Data collected for asia, india, and bangalore; air quality index to be imported from third-party api for dataset creation in the next session.', 'duration': 67.224, 'max_score': 1745.298, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1745298.jpg'}], 'start': 1494.826, 'title': 'Python execution, time measurement, data collection, and scraping process', 'summary': 'Covers writing the main function in python to measure execution time, creating html files for data collection, and using beautiful soup for data scraping. it also highlights the process of data collection using a scripting language, including importing air quality index data and a call to action for audience engagement.', 'chapters': [{'end': 1621.418, 'start': 1494.826, 'title': 'Python execution and time measurement', 'summary': 'Discusses writing the main function in python to measure the execution time of a function using the time library and formatting the output, emphasizing the importance of proper code structure for avoiding errors.', 'duration': 126.592, 'highlights': ["The function 'main' is used as the starting point of execution in Python programming language, and the time taken for execution is measured using the time library. Importance of 'main' function, time measurement in Python", 'Emphasizes the importance of writing code in a particular way inside a loop to avoid potential issues. Importance of code structure']}, {'end': 1726.395, 'start': 1621.838, 'title': 'Data collection and scraping process', 'summary': 'Discusses the process of scraping html data from 2013 to 2018, creating 12 html files for each year and preparing for visualization, anticipating a significant number of records, and the next step involves using beautiful soup to retrieve table information.', 'duration': 104.557, 'highlights': ['The process involves scraping HTML data from 2013 to 2018 and creating 12 HTML files for each year, potentially resulting in a significant number of records.', 'Visualization features are being considered for the data, and the next step will include using Beautiful Soup to retrieve table information from the HTML pages.', 'The data is being retrieved and stored locally, and additional data from the air quality index is yet to be collected for the output feature.']}, {'end': 1813.964, 'start': 1726.835, 'title': 'Data collection and scripting', 'summary': 'Covers the process of data collection using a scripting language, highlighting the steps to select and extract data for various cities and countries, with a plan to import air quality index data from a third-party api and combine it, concluding with a call to action for audience engagement and a promise to share the script on github.', 'duration': 87.129, 'highlights': ['The data collection process involves selecting and extracting data for various cities and countries, demonstrating the steps to change the code for different locations and emphasizing the importance of writing specific conditions for accurate data collection.', 'The plan includes importing air quality index data from a third-party API and combining it with the existing data set in the upcoming session, indicating a future development in the data collection process.', 'The presenter encourages audience engagement by inviting subscriptions to the YouTube channel and promoting a membership opportunity, indicating a call to action for viewer participation and support.', 'The session concludes with the promise to share the script on GitHub, offering accessibility to the demonstrated data collection process for the audience, and teasing the continuation of the data collection in the next video, creating anticipation for future content.']}], 'duration': 319.138, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/CTu0qnuMxgA/pics/CTu0qnuMxgA1494826.jpg', 'highlights': ["Importance of 'main' function, time measurement in Python", 'Importance of code structure', 'Visualization features considered for the data', 'Creating 12 HTML files for each year', 'Using Beautiful Soup to retrieve table information', 'Importing air quality index data for output feature', 'Selecting and extracting data for various cities and countries', 'Importing air quality index data from a third-party API', 'Emphasizing the importance of writing specific conditions for accurate data collection', 'Inviting subscriptions to the YouTube channel and promoting a membership opportunity', 'Sharing the script on GitHub for accessibility', 'Teasing the continuation of the data collection in the next video']}], 'highlights': ["The project covers the life cycle of a data science project and emphasizes data collection techniques like web scraping and third-party APIs, ensuring a comprehensive understanding of the project's scope and process.", 'The data collection process for the air quality index project involves collecting data from multiple resources and sources, providing a diverse and comprehensive dataset for analysis and modeling.', 'The presenter emphasizes practical coding examples, ensuring audience engagement and understanding through the provision of detailed code explanations and opportunities for practice.', 'The chapter emphasizes completing one data science project every month for the membership plan.', 'The plan to write a code from scratch using Spider ID for data collection and applying Beautiful Soup for HTML parsing.', 'The technique to scrape data from a website involves using a generic code with a for loop to collect data for each month and year, enabling the prediction of air quality index.', 'The process involves collecting data for each month and year between 2013 and 2018.', 'The chapter discusses the dynamic generation of a URL for collecting data by iterating through months and years.', 'By writing just 20 lines of code, you are able to actually retrieve the whole data, including downloading everything in the form of HTML files using request, and processing it without using beautiful soup.', "Importance of 'main' function, time measurement in Python", 'Importance of code structure', 'Visualization features considered for the data', 'Creating 12 HTML files for each year', 'Using Beautiful Soup to retrieve table information', 'Importing air quality index data for output feature', 'Selecting and extracting data for various cities and countries', 'Importing air quality index data from a third-party API', 'Emphasizing the importance of writing specific conditions for accurate data collection', 'Inviting subscriptions to the YouTube channel and promoting a membership opportunity', 'Sharing the script on GitHub for accessibility', 'Teasing the continuation of the data collection in the next video']}