title
Data Analytics Full Course - 12 Hours | Data Analytics Python | Data Analytics Course 2024 | Edureka
description
🔥𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐭 𝐂𝐨𝐮𝐫𝐬𝐞 (𝐔𝐬𝐞 𝐂𝐨𝐝𝐞 "𝐘𝐎𝐔𝐓𝐔𝐁𝐄𝟐𝟎") : https://www.edureka.co/masters-program/data-analyst-certification
In this Edureka Data Analyst Full Course video, you will learn what data analytics is, why data analytics is necessary, the types of data analytics, and the various data analytics applications. You will then understand a case study and perform analysis of data. Finally, we’ll see the top interview questions that will help you crack a data analyst interview.
00:00:00 Introduction
00:01:12 Agenda
00:03:31 What is Data Analytics
00:05:13 Why become a Data Analytic
00:06:52 Job Description of Data Analyst
00:34:48 What is Numpy?
00:46:39 Numpy Operations
01:02:43 Numpy Special Function
01:05:15 Pandas
02:07:50 Why Data Visualization
02:10:22 What is Data Visualization?
02:31:00 Plots
02:38:16 Working with Multiple Plots
02:40:48 Seaborn vs Matplotlib
02:58:34 What is Data?
02:59:24 Types of Data
03:07:01 Variables
03:12:56 Sample
03:18:51 Statistics in data analytics
03:29:00 Information Gain and entropy
03:39:09 What is Probability ?
03:45:04 Data and distribution
03:48:27 Python Use Case:Poker Probability
04:04:31 Introduction of Hypothesis testing
04:23:58 Types of errors
04:35:48 Industry Demonstration
04:56:06 SQL Basis
05:51:04 Why Time series Analysis?
05:52:46 What is Time series
06:02:54 What is stationary ?
06:13:03 Problem Statement
07:35:55 Charts in Excel
08:08:54 Data Analysis using Excel
08:14:49 Covid-19 Data Analysis Project using Python
08:41:52 Benefits of Data Visualization
08:44:25 Data Visualization in Python
09:17:24 World's best playing XI : Analyze FIFA Data
09:39:23 Introduction to Recession
10:03:54 Analytics market in a nutshell
10:09:50 Skills required
10:12:22 Job description
10:24:12 Top 10 Data Analytics tools
10:36:02 Top 10 skills required
10:45:21 Who is Business Analyst?
10:56:20 Role of Data Scientist and data analyst
11:16:38 Interview questions and answers
🔴 Subscribe to our channel to get video updates. Hit the subscribe button above: https://goo.gl/6ohpTV
🔴 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐎𝐧𝐥𝐢𝐧𝐞 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐚𝐧𝐝 𝐂𝐞𝐫𝐭𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬
🔵 DevOps Online Training: http://bit.ly/3VkBRUT
🌕 AWS Online Training: http://bit.ly/3ADYwDY
🔵 React Online Training: http://bit.ly/3Vc4yDw
🌕 Tableau Online Training: http://bit.ly/3guTe6J
🔵 Power BI Online Training: http://bit.ly/3VntjMY
🌕 Selenium Online Training: http://bit.ly/3EVDtis
🔵 PMP Online Training: http://bit.ly/3XugO44
🌕 Salesforce Online Training: http://bit.ly/3OsAXDH
🔵 Cybersecurity Online Training: http://bit.ly/3tXgw8t
🌕 Java Online Training: http://bit.ly/3tRxghg
🔵 Big Data Online Training: http://bit.ly/3EvUqP5
🌕 RPA Online Training: http://bit.ly/3GFHKYB
🔵 Python Online Training: http://bit.ly/3Oubt8M
🌕 Azure Online Training: http://bit.ly/3i4P85F
🔵 GCP Online Training: http://bit.ly/3VkCzS3
🌕 Microservices Online Training: http://bit.ly/3gxYqqv
🔵 Data Science Online Training: http://bit.ly/3V3nLrc
🌕 CEHv12 Online Training: http://bit.ly/3Vhq8Hj
🔵 Angular Online Training: http://bit.ly/3EYcCTe
🔴 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐑𝐨𝐥𝐞-𝐁𝐚𝐬𝐞𝐝 𝐂𝐨𝐮𝐫𝐬𝐞𝐬
🔵 DevOps Engineer Masters Program: http://bit.ly/3Oud9PC
🌕 Cloud Architect Masters Program: http://bit.ly/3OvueZy
🔵 Data Scientist Masters Program: http://bit.ly/3tUAOiT
🌕 Big Data Architect Masters Program: http://bit.ly/3tTWT0V
🔵 Machine Learning Engineer Masters Program: http://bit.ly/3AEq4c4
🌕 Business Intelligence Masters Program: http://bit.ly/3UZPqJz
🔵 Python Developer Masters Program: http://bit.ly/3EV6kDv
🌕 RPA Developer Masters Program: http://bit.ly/3OteYfP
🔵 Web Development Masters Program: http://bit.ly/3U9R5va
🌕 Computer Science Bootcamp Program : http://bit.ly/3UZxPBy
🔵 Cyber Security Masters Program: http://bit.ly/3U25rNR
🌕 Full Stack Developer Masters Program : http://bit.ly/3tWCE2S
🔴 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐏𝐫𝐨𝐠𝐫𝐚𝐦𝐬
🌕 Professional Certificate Program in DevOps with Purdue University: https://bit.ly/3Ov52lT
🔵 Advanced Certificate Program in Data Science with E&ICT Academy, IIT Guwahati: http://bit.ly/3V7ffrh
🌕 Artificial and Machine Learning PGD with E&ICT Academy
NIT Warangal: http://bit.ly/3OuZ3xs
📢📢 𝐓𝐨𝐩 𝟏𝟎 𝐓𝐫𝐞𝐧𝐝𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 𝐭𝐨 𝐋𝐞𝐚𝐫𝐧 𝐢𝐧 2023 𝐒𝐞𝐫𝐢𝐞𝐬 📢📢
⏩ NEW Top 10 Technologies To Learn In 2023 - https://youtu.be/udD_GQVDt5g
📌𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦: https://t.me/edurekaupdates
📌𝐓𝐰𝐢𝐭𝐭𝐞𝐫: https://twitter.com/edurekain
📌𝐋𝐢𝐧𝐤𝐞𝐝𝐈𝐧: https://www.linkedin.com/company/edureka
📌𝐈𝐧𝐬𝐭𝐚𝐠𝐫𝐚𝐦: https://www.instagram.com/edureka_learning/
📌𝐅𝐚𝐜𝐞𝐛𝐨𝐨𝐤: https://www.facebook.com/edurekaIN/
📌𝐒𝐥𝐢𝐝𝐞𝐒𝐡𝐚𝐫𝐞: https://www.slideshare.net/EdurekaIN
📌𝐂𝐚𝐬𝐭𝐛𝐨𝐱: https://castbox.fm/networks/505?country=IN
📌𝐌𝐞𝐞𝐭𝐮𝐩: https://www.meetup.com/edureka/
📌𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲: https://www.edureka.co/community/
Got a question on the topic? Please share it in the comment section below and our experts will answer it for you.
Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free) for more information.
detail
{'title': 'Data Analytics Full Course - 12 Hours | Data Analytics Python | Data Analytics Course 2024 | Edureka', 'heatmap': [{'end': 2103.657, 'start': 1674.037, 'weight': 0.736}, {'end': 4635.037, 'start': 4202.584, 'weight': 0.784}], 'summary': 'The 12-hour data analytics full course by edureka covers topics such as statistics, python, sql, and projects like covid-19 data analysis and twitter sentiment analysis, with an expected 18% growth rate in data analytics jobs. it also delves into python libraries for data extraction, numpy, pandas data manipulation, data visualization using matplotlib and seaborn, probability, hypothesis testing, time series analysis, data exploration, and visualization, fifa world cup data analysis, analytics in economic downturns, fintech trends, and essential data analysis and business analyst skills, highlighting the demand for data analysts with 87,000 vacancies in the united states and 4,199 in india, and average base salaries of 7.5 lakh rupees in india and $75,000 in the united states.', 'chapters': [{'end': 185.406, 'segs': [{'end': 100.872, 'src': 'embed', 'start': 70.817, 'weight': 0, 'content': [{'end': 77.32, 'text': "We'll start with the introduction to Data Analytics, where we'll learn what Data Analytics is and why should we learn it.", 'start': 70.817, 'duration': 6.503}, {'end': 80.822, 'text': 'After which, we will see how to become a Data Analyst.', 'start': 77.94, 'duration': 2.882}, {'end': 88.826, 'text': 'We will then move ahead with our curriculum starting with NumPy, Pandas, Matplotlib, Seaborn.', 'start': 81.722, 'duration': 7.104}, {'end': 93.848, 'text': "Now, it's time to deep dive into the technical aspects of data analytics.", 'start': 90.006, 'duration': 3.842}, {'end': 100.872, 'text': "We'll start with the statistics which are essential for data analytics followed by SQL for data analytics.", 'start': 94.629, 'duration': 6.243}], 'summary': 'Introduction to data analytics, followed by learning data analyst skills and essential statistics and sql for data analytics.', 'duration': 30.055, 'max_score': 70.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA70817.jpg'}], 'start': 7.432, 'title': 'Data analytics: overview and career opportunities', 'summary': 'Discusses the high demand for data analytics professionals, with an expected growth rate of 18% in the next few years and provides an overview of the comprehensive curriculum for the edureka data analytics full course, including topics like statistics, sql, python, and projects like covid-19 data analysis and twitter sentiment analysis.', 'chapters': [{'end': 185.406, 'start': 7.432, 'title': 'Data analytics: overview and career opportunities', 'summary': 'Discusses the high demand for data analytics professionals, with an expected growth rate of 18% in the next few years, and provides an overview of the comprehensive curriculum for the edureka data analytics full course, including topics like statistics, sql, python, and projects like covid-19 data analysis and twitter sentiment analysis.', 'duration': 177.974, 'highlights': ['The demand for data analytics professionals is expected to grow at a rate of 18% in the next few years. The high growth rate of 18% for data analytics professionals indicates a significant increase in demand for skilled individuals in this field.', 'The curriculum covers topics such as statistics, SQL, Python, and projects like COVID-19 data analysis and Twitter sentiment analysis. The comprehensive curriculum includes essential topics like statistics, SQL, Python, and practical projects, providing a well-rounded understanding of data analytics.', 'Data analytics benefits the enterprises by performing proper market analysis and improving the business requirements. The application of data analytics in performing market analysis and improving business requirements showcases its tangible benefits for enterprises.']}], 'duration': 177.974, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA7432.jpg', 'highlights': ['The demand for data analytics professionals is expected to grow at a rate of 18% in the next few years.', 'The comprehensive curriculum includes essential topics like statistics, SQL, Python, and practical projects, providing a well-rounded understanding of data analytics.', 'The application of data analytics in performing market analysis and improving business requirements showcases its tangible benefits for enterprises.']}, {'end': 3083.598, 'segs': [{'end': 925.588, 'src': 'embed', 'start': 883.524, 'weight': 0, 'content': [{'end': 888.426, 'text': 'merging the documents page by page and we can crop the pages, merging multiple pages into a single page.', 'start': 883.524, 'duration': 4.902}, {'end': 891.367, 'text': 'We can encrypt and decrypt the PDF files as well.', 'start': 888.706, 'duration': 2.661}, {'end': 895.308, 'text': "And there are so many more features of this PyPDF2 library that I'm talking about.", 'start': 891.407, 'duration': 3.901}, {'end': 901.97, 'text': 'And then by being pure Python, it should run on any Python platform without any dependencies on external libraries as well.', 'start': 895.768, 'duration': 6.202}, {'end': 906.994, 'text': 'and it can also work entirely on string objects rather than file streams.', 'start': 902.49, 'duration': 4.504}, {'end': 916.021, 'text': 'so it is going to allow the pdf manipulation in memory as well and therefore is a useful tool for website that manage or manipulate pdfs.', 'start': 906.994, 'duration': 9.027}, {'end': 921.005, 'text': 'so in this demo, i will show you how we can read from different pages of a pdf file.', 'start': 916.021, 'duration': 4.984}, {'end': 925.588, 'text': "so let us go right to the pie charm, guys, and i'll show you how we can do this.", 'start': 921.005, 'duration': 4.583}], 'summary': 'Pypdf2 library allows merging, encrypting, decrypting, and manipulating pdfs using pure python, with features for in-memory manipulation and no external library dependencies.', 'duration': 42.064, 'max_score': 883.524, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA883524.jpg'}, {'end': 1200.621, 'src': 'embed', 'start': 1173.609, 'weight': 3, 'content': [{'end': 1176.71, 'text': "so i'll open the url for you guys, which is open weather map.", 'start': 1173.609, 'duration': 3.101}, {'end': 1180.491, 'text': 'So this is basically the URL that we are using, guys.', 'start': 1177.97, 'duration': 2.521}, {'end': 1185.114, 'text': 'And we are going to get the current data or the current weather data.', 'start': 1181.332, 'duration': 3.782}, {'end': 1194.638, 'text': 'So to call this, we have to use this API call in which we have a city name and our API key.', 'start': 1188.195, 'duration': 6.443}, {'end': 1196.739, 'text': 'The API key is unique for everybody.', 'start': 1195.079, 'duration': 1.66}, {'end': 1200.621, 'text': 'For that, you have to sign into this login page and follow certain instructions.', 'start': 1196.759, 'duration': 3.862}], 'summary': 'Using openweathermap api to access current weather data by city name and unique api key.', 'duration': 27.012, 'max_score': 1173.609, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA1173609.jpg'}, {'end': 1414.924, 'src': 'embed', 'start': 1391.03, 'weight': 15, 'content': [{'end': 1400.656, 'text': 'So this PDF file reader is basically, you know, it initializes a PDF file reader object and the operation can take some time as the PDF stream.', 'start': 1391.03, 'duration': 9.626}, {'end': 1402.857, 'text': 'cross reference tables are read into memory.', 'start': 1400.656, 'duration': 2.201}, {'end': 1411.062, 'text': 'And then this get page method that we have over here, it basically retrieves a page number from this PDF file, whatever we are using over here.', 'start': 1403.297, 'duration': 7.765}, {'end': 1414.924, 'text': "And okay, let's move ahead with the API one.", 'start': 1411.902, 'duration': 3.022}], 'summary': 'Pdf file reader initializes object, retrieves page number, and loads cross-reference tables.', 'duration': 23.894, 'max_score': 1391.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA1391030.jpg'}, {'end': 1500.824, 'src': 'embed', 'start': 1461.59, 'weight': 5, 'content': [{'end': 1465.771, 'text': 'And for our program here, I have used the OpenWeatherMap API.', 'start': 1461.59, 'duration': 4.181}, {'end': 1471.153, 'text': 'And to use this API, you have to use a key in the URL that you have seen over here.', 'start': 1466.232, 'duration': 4.921}, {'end': 1472.234, 'text': 'After the city.', 'start': 1471.594, 'duration': 0.64}, {'end': 1478.516, 'text': "we have used this key, which is a secret key in my secret file over here, that I'm not obviously gonna show you,", 'start': 1472.234, 'duration': 6.282}, {'end': 1482.617, 'text': "because it's unique and you have to generate it for yourself.", 'start': 1478.516, 'duration': 4.101}, {'end': 1489.4, 'text': "So this can be a little exercise for you guys to figure out what kind of URL you're gonna use for this API.", 'start': 1483.398, 'duration': 6.002}, {'end': 1495.802, 'text': 'And if you search more on the internet, you can also find a lot more other APIs that you can use to get the data.', 'start': 1490.26, 'duration': 5.542}, {'end': 1497.403, 'text': 'This is a very simple example.', 'start': 1496.302, 'duration': 1.101}, {'end': 1500.824, 'text': "I'm just demonstrating how you can extract the data using Python.", 'start': 1497.423, 'duration': 3.401}], 'summary': 'Demonstrated usage of openweathermap api to extract data using python.', 'duration': 39.234, 'max_score': 1461.59, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA1461590.jpg'}, {'end': 2103.657, 'src': 'heatmap', 'start': 1674.037, 'weight': 0.736, 'content': [{'end': 1686.261, 'text': 'so from bs4 i am going to import beautiful soup, yes, and from url 3 dot.', 'start': 1674.037, 'duration': 12.224}, {'end': 1689.182, 'text': "so i'm going to use a response.", 'start': 1686.261, 'duration': 2.921}, {'end': 1693.203, 'text': 'import url.', 'start': 1689.182, 'duration': 4.021}, {'end': 1695.023, 'text': 'url open.', 'start': 1693.203, 'duration': 1.82}, {'end': 1696.484, 'text': 'okay, we have made a mistake over here.', 'start': 1695.023, 'duration': 1.461}, {'end': 1700.405, 'text': "it's not url lip 3, it's url lib, and now it should work.", 'start': 1696.484, 'duration': 3.921}, {'end': 1701.905, 'text': "fine, guys, i'm gonna do it again.", 'start': 1700.405, 'duration': 1.5}, {'end': 1706.837, 'text': 'url open.', 'start': 1705.996, 'duration': 0.841}, {'end': 1713.845, 'text': "now i'm going to paste the url, which is to scrape.com, and inside this we have a whole page.", 'start': 1706.837, 'duration': 7.008}, {'end': 1716.989, 'text': "okay, i'll just copy this url,", 'start': 1713.845, 'duration': 3.144}, {'end': 1728.736, 'text': "take it to the web page so that it's better to understand this what we are scraping actually and this is the web page that I'm going to scrape for data to extract data from this web page.", 'start': 1716.989, 'duration': 11.747}, {'end': 1731.617, 'text': 'the title itself says quotes to scrape.', 'start': 1728.736, 'duration': 2.881}, {'end': 1736.519, 'text': 'then we have all these codes that we can scrape inside our program and store it.', 'start': 1731.617, 'duration': 4.902}, {'end': 1742.961, 'text': 'also, and before that, I have to tell you a few things about the steps that are involved in web scraping, guys.', 'start': 1736.519, 'duration': 6.442}, {'end': 1746.343, 'text': "so first of all, you have to find the url that i've told you about.", 'start': 1743.661, 'duration': 2.682}, {'end': 1752.808, 'text': 'you have to be very specific when you are choosing a url for scraping the data because there are a few legality issues related to it.', 'start': 1746.343, 'duration': 6.465}, {'end': 1755.43, 'text': "i mean sometimes you don't have permission to scrape the data.", 'start': 1752.808, 'duration': 2.622}, {'end': 1758.412, 'text': 'sometimes the robot.txt file is not present,', 'start': 1755.43, 'duration': 2.982}, {'end': 1766.558, 'text': "so you never know if you have the permission or you have i mean if you don't have the permission and sometimes it's clearly written that you can scrape the data.", 'start': 1758.412, 'duration': 8.146}, {'end': 1771.991, 'text': 'After that you inspect the page and check out everything or actually you know,', 'start': 1767.168, 'duration': 4.823}, {'end': 1777.554, 'text': 'explore the web page to see how you can scrape the data and what kind of data are you trying to scrape from here?', 'start': 1771.991, 'duration': 5.563}, {'end': 1785.219, 'text': "And after that you have to find the data to extract that you want to extract and in our case I'm going to extract the quotes from the web page.", 'start': 1778.154, 'duration': 7.065}, {'end': 1789.38, 'text': 'so after this, after all these steps, the three steps,', 'start': 1786.059, 'duration': 3.321}, {'end': 1797.581, 'text': 'you have figured out what kind of data you are going to scrape from which web page and what are the tags that you are going to use from inspecting the element.', 'start': 1789.38, 'duration': 8.201}, {'end': 1801.922, 'text': 'then you write the python script using the beautiful soup and url lib open.', 'start': 1797.581, 'duration': 4.341}, {'end': 1805.083, 'text': 'then you extract the data and store it in a format.', 'start': 1801.922, 'duration': 3.161}, {'end': 1807.584, 'text': "so we'll move to python again, guys.", 'start': 1805.083, 'duration': 2.501}, {'end': 1809.504, 'text': 'so we have our data, guys.', 'start': 1807.584, 'duration': 1.92}, {'end': 1815.869, 'text': "i'm sorry, we have our url and now i'm gonna use html is equal to url.", 'start': 1809.504, 'duration': 6.365}, {'end': 1820.651, 'text': 'open and provide url over here.', 'start': 1815.869, 'duration': 4.782}, {'end': 1824.972, 'text': "yes, and now i'm gonna take one variable soup.", 'start': 1820.651, 'duration': 4.321}, {'end': 1827.953, 'text': "i'm gonna use beautiful soup.", 'start': 1824.972, 'duration': 2.981}, {'end': 1833.955, 'text': "wait a minute, okay, so we're gonna begin again, beautiful soup.", 'start': 1827.953, 'duration': 6.002}, {'end': 1840.457, 'text': "inside this i'm gonna pass html and we'll write html dot, answer.", 'start': 1833.955, 'duration': 6.502}, {'end': 1856.64, 'text': "and now what we'll do is we'll write type sue and then we write all links is equal to sue dot, find all.", 'start': 1843.753, 'duration': 12.887}, {'end': 1863.963, 'text': 'and inside this we have to mention what we actually have to scrape from here.', 'start': 1856.64, 'duration': 7.323}, {'end': 1866.825, 'text': "so we're going to inspect this page, guys.", 'start': 1863.963, 'duration': 2.862}, {'end': 1876.998, 'text': "so we have opened inspect element and we'll check for what we have to scrape over here.", 'start': 1866.825, 'duration': 10.173}, {'end': 1883.099, 'text': 'so we have a span inside this we have the text, guys.', 'start': 1876.998, 'duration': 6.101}, {'end': 1894.481, 'text': "it's in the div class over here and the class is code yes, and for each it is same.", 'start': 1883.099, 'duration': 11.382}, {'end': 1897.042, 'text': 'so i think i know what we have to do here.', 'start': 1894.481, 'duration': 2.561}, {'end': 1902.583, 'text': "so we'll find all the div where the class is.", 'start': 1897.042, 'duration': 5.541}, {'end': 1906.602, 'text': "i'm not drawing it was quote.", 'start': 1904.221, 'duration': 2.381}, {'end': 1923.95, 'text': "yes, and after this we take one more variable string cells and we're going to change all of it to strings and we are not going to get the clear text.", 'start': 1906.602, 'duration': 17.348}, {'end': 1927.891, 'text': "so we'll use a beautiful soup again inside this.", 'start': 1923.95, 'duration': 3.941}, {'end': 1937.835, 'text': 'i am going to pass string cells and then we can have html dot parser.', 'start': 1927.891, 'duration': 9.944}, {'end': 1942.357, 'text': 'html.parser is basically nothing, guys.', 'start': 1937.835, 'duration': 4.522}, {'end': 1948, 'text': "it's used to get the data in a format which is understandable to the user,", 'start': 1942.357, 'duration': 5.643}, {'end': 1954.083, 'text': 'because not every aspect of the text that you have on internet is readable to you guys.', 'start': 1948, 'duration': 6.083}, {'end': 1959.425, 'text': "sometimes it's in xml format and it has to be passed in html so that you will be able to understand it better.", 'start': 1954.083, 'duration': 5.342}, {'end': 1963.744, 'text': 'now we print the clear text.', 'start': 1960.262, 'duration': 3.482}, {'end': 1967.487, 'text': "let's see what the output is all right.", 'start': 1963.744, 'duration': 3.743}, {'end': 1971.689, 'text': 'so we have all the text from each div.', 'start': 1967.487, 'duration': 4.202}, {'end': 1974.791, 'text': 'so we have tags also all right.', 'start': 1971.689, 'duration': 3.102}, {'end': 1978.473, 'text': 'so this is how you scrape a data from a website, guys.', 'start': 1974.791, 'duration': 3.682}, {'end': 1984.797, 'text': 'and similarly i have written one more code for another url, that is books to scrape.com.', 'start': 1978.473, 'duration': 6.324}, {'end': 1992.356, 'text': "so i'll just take you to this url first of all, and then i'll show you what i've done over here, changed nothing from the code.", 'start': 1984.797, 'duration': 7.559}, {'end': 1995.158, 'text': "so it's basically the same.", 'start': 1992.356, 'duration': 2.802}, {'end': 1997.02, 'text': 'and then we have this page.', 'start': 1995.158, 'duration': 1.862}, {'end': 2005.043, 'text': "okay, i'll just this web page, which is a basically, you know, uh, books to scrape that we have over here.", 'start': 1997.02, 'duration': 8.023}, {'end': 2006.844, 'text': 'so it has all these books that i can scrape.', 'start': 2005.043, 'duration': 1.801}, {'end': 2013.085, 'text': 'it is basically a e-commerce platform replica that you can think of it as the e-commerce platform.', 'start': 2006.844, 'duration': 6.241}, {'end': 2019.987, 'text': "and then let's say you want to, uh, scrape the name of a book or name of all these books on this page.", 'start': 2013.085, 'duration': 6.902}, {'end': 2024.168, 'text': 'so what you will inspect, the element check for.', 'start': 2019.987, 'duration': 4.181}, {'end': 2026.94, 'text': 'So we want this.', 'start': 2026.119, 'duration': 0.821}, {'end': 2029.523, 'text': 'So we want the href in h3.', 'start': 2027.601, 'duration': 1.922}, {'end': 2035.248, 'text': "So what I've done over here is we have found all the h3 and got the text from it.", 'start': 2029.803, 'duration': 5.445}, {'end': 2036.429, 'text': 'So let me run this.', 'start': 2035.388, 'duration': 1.041}, {'end': 2042.335, 'text': "And you'll see that we are getting the output as somewhat like all the names of the books over here.", 'start': 2037.971, 'duration': 4.364}, {'end': 2047.24, 'text': 'So this is also a very simple example of web scraping that you can do guys.', 'start': 2043.056, 'duration': 4.184}, {'end': 2059.806, 'text': 'So what is NumPy? NumPy is basically a module or you can say a library that is available in Python for scientific computing.', 'start': 2053.143, 'duration': 6.663}, {'end': 2061.427, 'text': 'Now it contains a lot of things.', 'start': 2060.085, 'duration': 1.342}, {'end': 2066.949, 'text': 'It contains a powerful n-dimensional array object, then tools for integrating with C, C++.', 'start': 2061.726, 'duration': 5.223}, {'end': 2072.032, 'text': 'It is also very useful in linear algebra, Fourier transform and random number capabilities.', 'start': 2067.19, 'duration': 4.842}, {'end': 2080.141, 'text': 'Now let me tell you guys numpy can also be used as an efficient multidimensional container for data for generic data.', 'start': 2072.577, 'duration': 7.564}, {'end': 2082.922, 'text': 'Now let me tell you what exactly is multidimensional array.', 'start': 2080.46, 'duration': 2.462}, {'end': 2086.083, 'text': 'Now over here this picture actually depicts multidimensional array.', 'start': 2083.181, 'duration': 2.902}, {'end': 2089.465, 'text': 'So we have various elements that are stored in their respective memory locations.', 'start': 2086.304, 'duration': 3.161}, {'end': 2092.047, 'text': 'So we have one two threes in their own memory locations.', 'start': 2089.485, 'duration': 2.562}, {'end': 2093.687, 'text': 'Now why is it two dimensional.', 'start': 2092.347, 'duration': 1.34}, {'end': 2097.149, 'text': 'It is two dimensional because it has rows as well as columns.', 'start': 2094.007, 'duration': 3.142}, {'end': 2100.791, 'text': 'So you can see we have three columns and we have four rows available.', 'start': 2097.569, 'duration': 3.222}, {'end': 2103.657, 'text': 'So that is the reason why it becomes a two-dimensional array.', 'start': 2101.155, 'duration': 2.502}], 'summary': 'Demonstration of web scraping with python and beautiful soup, and explanation of numpy features.', 'duration': 429.62, 'max_score': 1674.037, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA1674037.jpg'}, {'end': 1840.457, 'src': 'embed', 'start': 1809.504, 'weight': 13, 'content': [{'end': 1815.869, 'text': "i'm sorry, we have our url and now i'm gonna use html is equal to url.", 'start': 1809.504, 'duration': 6.365}, {'end': 1820.651, 'text': 'open and provide url over here.', 'start': 1815.869, 'duration': 4.782}, {'end': 1824.972, 'text': "yes, and now i'm gonna take one variable soup.", 'start': 1820.651, 'duration': 4.321}, {'end': 1827.953, 'text': "i'm gonna use beautiful soup.", 'start': 1824.972, 'duration': 2.981}, {'end': 1833.955, 'text': "wait a minute, okay, so we're gonna begin again, beautiful soup.", 'start': 1827.953, 'duration': 6.002}, {'end': 1840.457, 'text': "inside this i'm gonna pass html and we'll write html dot, answer.", 'start': 1833.955, 'duration': 6.502}], 'summary': 'Using html and beautiful soup to process url data.', 'duration': 30.953, 'max_score': 1809.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA1809504.jpg'}, {'end': 2860.334, 'src': 'embed', 'start': 2832.78, 'weight': 1, 'content': [{'end': 2839.603, 'text': "If I go on and add some more elements, say four, five, six, seven, then if I execute this, you'll see that seven elements have appeared.", 'start': 2832.78, 'duration': 6.823}, {'end': 2842.143, 'text': 'That means the total number of elements in my array is seven.', 'start': 2839.623, 'duration': 2.52}, {'end': 2844.985, 'text': 'Then comes the shape part that I was talking about.', 'start': 2842.584, 'duration': 2.401}, {'end': 2851.987, 'text': "So in order to find the shape, what you can do is you can just type in here a.shape and it'll give you the shape.", 'start': 2845.505, 'duration': 6.482}, {'end': 2853.548, 'text': 'So let us see what happens.', 'start': 2852.087, 'duration': 1.461}, {'end': 2860.334, 'text': 'So it has seven columns but there are no rows so it has given seven comma blank.', 'start': 2854.83, 'duration': 5.504}], 'summary': 'Array contains 7 elements and has a shape of 7 columns with no rows.', 'duration': 27.554, 'max_score': 2832.78, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA2832780.jpg'}], 'start': 185.826, 'title': 'Data analytics and python data tools', 'summary': "Discusses the growing popularity of data analytics, demand for data analysts in india and the us job markets, average salaries, top companies hiring data analysts, skills and tools required, and the process of data extraction using python libraries like request, scrapy, and beautiful soup, with a demonstration of extracting data from a pdf using the pypdf2 library. it also covers extracting data from pdf files and apis in python, web scraping using python's beautifulsoup and urllib libraries, introduction to numpy in python, efficiency of using numpy arrays over lists, and various operations with numpy arrays, including practical examples showcasing their usage, and demonstrates a significant time difference of 208 milliseconds for lists and 67 milliseconds for numpy arrays when computing the sum of a million elements.", 'chapters': [{'end': 676.962, 'start': 185.826, 'title': 'Data analytics insights', 'summary': 'Discusses the growing popularity of data analytics, the demand for data analysts in india and the us job markets, the average salaries, the top companies hiring data analysts, the skills and tools required, and the steps to become a data analyst.', 'duration': 491.136, 'highlights': ['The demand for data analysts in India and the US job markets is significant, with over 35,000 job vacancies in India and over 174,000 in the US, along with high demand in specific regions like Bangalore and California. Over 35,000 job vacancies in India and over 174,000 in the US for data analysts, with high demand in regions like Bangalore and California.', 'The average annual salary for a data analyst is 5 lakh in India and $69,517 in the US, indicating decent compensation for the role. The average annual salary for a data analyst is 5 lakh in India and $69,517 in the US.', 'Top companies hiring data analysts include Accenture, TCS, American Express, Google, KPMG, Capgemini, Mindtree, Deloitte, Flipkart, Cognizant, Dell, Ernst & Young, IBM, Mu Sigma, and Goldman Sachs. Top companies hiring data analysts include Accenture, TCS, American Express, Google, KPMG, Capgemini, Mindtree, Deloitte, Flipkart, Cognizant, Dell, Ernst & Young, IBM, Mu Sigma, and Goldman Sachs.', 'Skills and tools required for a data analyst include SQL, Python, R, Alteryx, Tableau, Hadoop, Big Data, relational database, neural networks, deep learning, and AI, along with organizational, analytical, database concepts, and programming skills. Skills and tools required for a data analyst include SQL, Python, R, Alteryx, Tableau, Hadoop, Big Data, relational database, neural networks, deep learning, AI, organizational, analytical, database concepts, and programming skills.', 'The steps to become a data analyst involve creating a learning plan, building technical skills, working on real-time projects, developing a portfolio, practicing presenting findings, and obtaining certifications or a degree course. Steps to become a data analyst involve creating a learning plan, building technical skills, working on real-time projects, developing a portfolio, practicing presenting findings, and obtaining certifications or a degree course.']}, {'end': 1083.426, 'start': 677.583, 'title': 'Python data extraction', 'summary': 'Discusses the process of data extraction, the use of python libraries like request, scrapy, and beautiful soup for data extraction, and a demonstration of extracting data from a pdf using the pypdf2 library.', 'duration': 405.843, 'highlights': ['The chapter discusses the process of data extraction and the use of Python libraries like request, scrapy, and beautiful soup for data extraction. It covers the different sources of data extraction, the purpose of data extraction, and the libraries in Python for data extraction such as request, scrapy, and beautiful soup.', 'Python has libraries like request, scrapy, and beautiful soup for data extraction. Python provides libraries such as request for making HTTP requests, scrapy for web crawling and data extraction, and beautiful soup for parsing HTML and extracting data.', 'A demonstration of extracting data from a PDF using the pyPDF2 library is provided. The demonstration includes a brief introduction to the pyPDF2 library, its capabilities for extracting document information, merging, splitting, encrypting, and decrypting PDF files, and its platform independence and memory manipulation capabilities.']}, {'end': 1653.027, 'start': 1083.426, 'title': 'Data extraction in python: pdf and api', 'summary': 'Covers extracting data from pdf files using the pypdf library and from apis using the requests library in python, demonstrating data extraction and usage with examples and guidelines.', 'duration': 569.601, 'highlights': ['Extracting Data from PDFs Using PyPDF Library Demonstrated extraction of data from PDF files using the PyPDF library, displaying content from specific pages and explaining the get page method.', 'Data Extraction Using APIs with Requests Library Illustrated data extraction from APIs using the requests library, fetching current weather data from OpenWeatherMap API, with an emphasis on using a unique API key and handling response data.', 'Guidelines and Legality Issues in Web Scraping Addressed the controversial nature of web scraping, emphasizing the importance of adhering to website permissions and legality, citing a case of Craigslist suing a company and advising to check robots.txt files for permissions.']}, {'end': 2429.071, 'start': 1653.027, 'title': 'Web scraping and numpy in python', 'summary': "Covers web scraping using python's beautifulsoup and urllib libraries, including the steps involved in web scraping, extracting data from two example websites, and also delves into an introduction to numpy in python, explaining its features, advantages over lists, and demonstrating its usage through practical examples.", 'duration': 776.044, 'highlights': ["The chapter covers web scraping using Python's BeautifulSoup and urllib libraries It discusses the process of web scraping using Python libraries BeautifulSoup and urllib, providing practical examples and steps involved.", 'Introduction to NumPy in Python, explaining its features and advantages over lists The chapter introduces NumPy in Python, explaining its features such as n-dimensional array object, integration with C, C++, and its utility in linear algebra, Fourier transform, and random number capabilities. It also highlights the advantages of NumPy over lists, including its memory efficiency, speed, and convenience.', 'Demonstration of NumPy usage through practical examples The chapter provides practical examples demonstrating the usage of NumPy in creating arrays, comparing memory usage with lists, and highlighting the speed and convenience of NumPy arrays over lists.']}, {'end': 3083.598, 'start': 2429.731, 'title': 'Using numpy for efficient array operations', 'summary': 'Elaborates on the efficiency of using numpy arrays over lists, with a demonstration of calculating the sum of lists and numpy arrays, showing a significant time difference of 208 milliseconds for lists and 67 milliseconds for numpy arrays when computing the sum of a million elements. it further explores various operations with numpy arrays, including finding the dimension, byte size, and data types of elements, as well as determining the size and shape of the array. additionally, it covers the reshape and slicing operations, with practical examples showcasing their usage.', 'duration': 653.867, 'highlights': ['NumPy array is faster and occupies less space compared to lists The time taken to compute the sum of a million elements was 208 milliseconds for lists and 67 milliseconds for NumPy arrays, demonstrating the significant efficiency of NumPy arrays over lists.', 'Demonstration of finding dimension, byte size, and data types of elements in a NumPy array The practical demonstration showcases how to find the dimension, byte size, and data types of elements stored in a NumPy array, providing a comprehensive understanding of the array structure.', 'Exploring operations such as finding size, shape, reshape, and slicing of NumPy arrays The chapter delves into various operations with NumPy arrays, including finding the size and shape of the array, performing reshape to alter its dimensions, and using slicing to extract specific elements, with practical examples to illustrate their application.']}], 'duration': 2897.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA185826.jpg', 'highlights': ['Over 35,000 job vacancies in India and over 174,000 in the US for data analysts, with high demand in regions like Bangalore and California.', 'The average annual salary for a data analyst is 5 lakh in India and $69,517 in the US.', 'Top companies hiring data analysts include Accenture, TCS, American Express, Google, KPMG, Capgemini, Mindtree, Deloitte, Flipkart, Cognizant, Dell, Ernst & Young, IBM, Mu Sigma, and Goldman Sachs.', 'Skills and tools required for a data analyst include SQL, Python, R, Alteryx, Tableau, Hadoop, Big Data, relational database, neural networks, deep learning, AI, organizational, analytical, database concepts, and programming skills.', 'The steps to become a data analyst involve creating a learning plan, building technical skills, working on real-time projects, developing a portfolio, practicing presenting findings, and obtaining certifications or a degree course.', 'The chapter discusses the process of data extraction and the use of Python libraries like request, scrapy, and beautiful soup for data extraction.', 'Python provides libraries such as request for making HTTP requests, scrapy for web crawling and data extraction, and beautiful soup for parsing HTML and extracting data.', 'A demonstration of extracting data from a PDF using the pyPDF2 library is provided.', 'Demonstrated extraction of data from PDF files using the PyPDF library, displaying content from specific pages and explaining the get page method.', 'Illustrated data extraction from APIs using the requests library, fetching current weather data from OpenWeatherMap API, with an emphasis on using a unique API key and handling response data.', 'Addressed the controversial nature of web scraping, emphasizing the importance of adhering to website permissions and legality, citing a case of Craigslist suing a company and advising to check robots.txt files for permissions.', 'It discusses the process of web scraping using Python libraries BeautifulSoup and urllib, providing practical examples and steps involved.', 'The chapter introduces NumPy in Python, explaining its features such as n-dimensional array object, integration with C, C++, and its utility in linear algebra, Fourier transform, and random number capabilities.', 'The time taken to compute the sum of a million elements was 208 milliseconds for lists and 67 milliseconds for NumPy arrays, demonstrating the significant efficiency of NumPy arrays over lists.', 'The practical demonstration showcases how to find the dimension, byte size, and data types of elements stored in a NumPy array, providing a comprehensive understanding of the array structure.', 'The chapter delves into various operations with NumPy arrays, including finding the size and shape of the array, performing reshape to alter its dimensions, and using slicing to extract specific elements, with practical examples to illustrate their application.']}, {'end': 5107.71, 'segs': [{'end': 3940.442, 'src': 'embed', 'start': 3910.401, 'weight': 1, 'content': [{'end': 3916.845, 'text': 'We can work on ordered and unordered time series data, arbitrary matrix data with rows and column labels.', 'start': 3910.401, 'duration': 6.444}, {'end': 3923.829, 'text': 'We can work on unlabeled data and we can also work on any other form of observational or statistical data sets.', 'start': 3917.305, 'duration': 6.524}, {'end': 3927.211, 'text': "Now, I'm going to tell you how you can install pandas on your systems guys.", 'start': 3924.57, 'duration': 2.641}, {'end': 3929.073, 'text': "It's very easy to install python pandas.", 'start': 3927.271, 'duration': 1.802}, {'end': 3933.795, 'text': 'You just go to your command line or terminal and just type pip, install pandas.', 'start': 3929.093, 'duration': 4.702}, {'end': 3940.442, 'text': "or if you're working on an IDE such as pycharm, You can just simply type in install pandas in your terminal over there,", 'start': 3933.795, 'duration': 6.647}], 'summary': "Pandas can work with various data types, and it's easy to install using pip or an ide like pycharm.", 'duration': 30.041, 'max_score': 3910.401, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA3910401.jpg'}, {'end': 4635.037, 'src': 'heatmap', 'start': 4202.584, 'weight': 0.784, 'content': [{'end': 4208.589, 'text': "I'll have to import numpy as well because I'm going to use it to create a null value.", 'start': 4202.584, 'duration': 6.005}, {'end': 4214.673, 'text': "All right, so DF is equal to I'm going to make a series.", 'start': 4208.609, 'duration': 6.064}, {'end': 4228.005, 'text': "So I'll just write it as All right Series and I'm gonna pass a list of values.", 'start': 4214.693, 'duration': 13.312}, {'end': 4243.234, 'text': "Let's say 1 2 3 4 5 6 and I'm gonna use my numpy and now to create a null value 8 9 and one more value.", 'start': 4228.045, 'duration': 15.189}, {'end': 4248.409, 'text': "Let's say 10 So it's going to create a series now when I print S.", 'start': 4243.254, 'duration': 5.155}, {'end': 4253.113, 'text': 'So we have a series which has indexes which are already there and all these values that I pass inside a list.', 'start': 4248.409, 'duration': 4.704}, {'end': 4257.778, 'text': 'So this is how you create a series in Python guys using pandas after this.', 'start': 4253.574, 'duration': 4.204}, {'end': 4259.88, 'text': "I'm going to tell you how you create a data frame.", 'start': 4257.818, 'duration': 2.062}, {'end': 4261.061, 'text': 'So for that also,', 'start': 4260.28, 'duration': 0.781}, {'end': 4267.207, 'text': "I'm going to tell you how you create a data frame using a dictionary object and how you can create a data frame using series as well.", 'start': 4261.061, 'duration': 6.146}, {'end': 4275.158, 'text': 'So now what we are going to do is we are going to create a data frame by passing a numpy array with the daytime index and label columns.', 'start': 4268.757, 'duration': 6.401}, {'end': 4278.099, 'text': "So I'll take one variable.", 'start': 4275.718, 'duration': 2.381}, {'end': 4285, 'text': "Let's say date or dates and just type it as D and I'm going to take PD dot.", 'start': 4278.119, 'duration': 6.881}, {'end': 4289.841, 'text': "So we're going to date the date range and after this I'm going to pass a few value.", 'start': 4285.62, 'duration': 4.221}, {'end': 4295.162, 'text': "Let's say 2020 and I'm going to pass values like we're in the month of March.", 'start': 4289.881, 'duration': 5.281}, {'end': 4297.022, 'text': "So I'll just write it at March.", 'start': 4295.222, 'duration': 1.8}, {'end': 4304.636, 'text': "and after this I'm gonna take periods, which is equal to let's say 10.", 'start': 4297.933, 'duration': 6.703}, {'end': 4307.177, 'text': 'So this is my date range, guys.', 'start': 4304.636, 'duration': 2.541}, {'end': 4315.04, 'text': 'Okay, I have an invalid syntax, right? Should work fine now.', 'start': 4307.777, 'duration': 7.263}, {'end': 4319.942, 'text': 'So when I print D over here, so I have all these values in our date range format.', 'start': 4315.06, 'duration': 4.882}, {'end': 4329.369, 'text': "After this, what I'm going to do is I am going to take one data frame, which I am going to take as DF for obvious reasons, to make it clearer,", 'start': 4320.702, 'duration': 8.667}, {'end': 4333.331, 'text': "and I'm going to take data frame and inside this I'm going to pass a few values.", 'start': 4329.369, 'duration': 3.962}, {'end': 4335.612, 'text': "So first of all, I'm going to take a few random values.", 'start': 4333.351, 'duration': 2.261}, {'end': 4344.977, 'text': "So I'm going to use NP dot random dot random number and inside this I'm going to pass 10, let's say four.", 'start': 4336.292, 'duration': 8.685}, {'end': 4353.622, 'text': "Now I'm going to get the index values as B and I'm going to have to pass a few more values which is columns.", 'start': 4346.198, 'duration': 7.424}, {'end': 4358.957, 'text': "So I'll pass the columns as a list.", 'start': 4355.575, 'duration': 3.382}, {'end': 4361.198, 'text': "I'm gonna take, let's say, four columns.", 'start': 4359.597, 'duration': 1.601}, {'end': 4364.06, 'text': "So I'm just gonna take, okay, wait a minute.", 'start': 4361.218, 'duration': 2.842}, {'end': 4373.965, 'text': 'A, A, B, C, D.', 'start': 4364.08, 'duration': 9.885}, {'end': 4375.626, 'text': 'All right, do we have any errors? No.', 'start': 4373.965, 'duration': 1.661}, {'end': 4378.047, 'text': "So now I'm gonna print my data frame.", 'start': 4376.347, 'duration': 1.7}, {'end': 4381.93, 'text': 'So I have a data frame, guys, which I have created using, you know, passing a numpy array.', 'start': 4378.067, 'duration': 3.863}, {'end': 4384.131, 'text': 'And I have a date time index.', 'start': 4382.59, 'duration': 1.541}, {'end': 4387.77, 'text': 'with labeled columns, which are A B C and D.', 'start': 4384.929, 'duration': 2.841}, {'end': 4391.811, 'text': 'This is my index guys and I have all these random values using NP array.', 'start': 4387.77, 'duration': 4.041}, {'end': 4395.472, 'text': 'So this is how you create a data frame is just a simple example,', 'start': 4392.451, 'duration': 3.021}, {'end': 4401.694, 'text': "and I'm going to show you how you can create a data frame by passing a dictionary of objects that can be, you know, converted into a series also.", 'start': 4395.472, 'duration': 6.222}, {'end': 4408.917, 'text': "So I'll take let's say again DF is equal to PD dot data frame.", 'start': 4402.455, 'duration': 6.462}, {'end': 4412.778, 'text': "And I'm going to pass a dictionary over here now.", 'start': 4410.717, 'duration': 2.061}, {'end': 4415.001, 'text': "So I'm gonna take a few values first of all.", 'start': 4413.28, 'duration': 1.721}, {'end': 4417.382, 'text': "So first value is let's say A.", 'start': 4415.621, 'duration': 1.761}, {'end': 4426.465, 'text': "Now after that, I have to pass something, right? Okay, I'm gonna write, let's say, a list of one, two, three, and four.", 'start': 4417.382, 'duration': 9.083}, {'end': 4430.907, 'text': "After this, my next value is gonna be, let's say B.", 'start': 4428.206, 'duration': 2.701}, {'end': 4436.069, 'text': "And I'm going to pass a timestamp, let's say.", 'start': 4430.907, 'duration': 5.162}, {'end': 4444.063, 'text': "And for timestamp, I'm gonna use the same I have used over here, 2020.", 'start': 4438.69, 'duration': 5.373}, {'end': 4458.654, 'text': "And after this I'm gonna pass one more value, let's say C, and I'm going to use a series now, a series object.", 'start': 4444.063, 'duration': 14.591}, {'end': 4477.73, 'text': "And inside this I'm gonna pass one and the index is going to be, let's say, range index is equal to a list with a range of four,", 'start': 4461.416, 'duration': 16.314}, {'end': 4480.112, 'text': 'because we have only four values over here.', 'start': 4477.73, 'duration': 2.382}, {'end': 4481.413, 'text': "We don't want any null values.", 'start': 4480.172, 'duration': 1.241}, {'end': 4486.938, 'text': 'And after this, I have to type in the data type as well, the data type of the series, guys.', 'start': 4482.474, 'duration': 4.464}, {'end': 4493.463, 'text': "So for that, I use dtype is equal to, let's say, float 32.", 'start': 4488.159, 'duration': 5.304}, {'end': 4497.246, 'text': 'All right.', 'start': 4493.463, 'duration': 3.783}, {'end': 4500.188, 'text': 'After that, I provide my next value, which is d.', 'start': 4498.107, 'duration': 2.081}, {'end': 4506.179, 'text': "Now for d, I'm gonna use a numpy array.", 'start': 4502.737, 'duration': 3.442}, {'end': 4519.325, 'text': "And for this, I'm gonna pass a value, let's say, not three, let's say five multiplied by four.", 'start': 4510.141, 'duration': 9.184}, {'end': 4527.75, 'text': "And let's take the dtype is equal to integer 32.", 'start': 4521.246, 'duration': 6.504}, {'end': 4529.451, 'text': 'Yes, all right.', 'start': 4527.75, 'duration': 1.701}, {'end': 4532.152, 'text': 'Now I take my final value, which is gonna be e.', 'start': 4530.291, 'duration': 1.861}, {'end': 4540.372, 'text': "and inside this I'm gonna pass a data frame or we're gonna use the categorical object, guys.", 'start': 4533.43, 'duration': 6.942}, {'end': 4542.893, 'text': "We're gonna talk about this later on in the session, so don't worry.", 'start': 4540.692, 'duration': 2.201}, {'end': 4549.254, 'text': "I'm just showing you how you can create a data frame using all these objects that we have at our disposal, guys.", 'start': 4543.193, 'duration': 6.061}, {'end': 4559.817, 'text': 'Now instead of test and train, we can just call it as true or false.', 'start': 4555.876, 'duration': 3.941}, {'end': 4561.478, 'text': "Doesn't matter.", 'start': 4560.977, 'duration': 0.501}, {'end': 4569.666, 'text': 'we are taking categorical, so it has to be either true or false, or it can be zero or one,', 'start': 4561.478, 'duration': 8.188}, {'end': 4573.209, 'text': 'but it has to be decisive in a way that there are only two values.', 'start': 4569.666, 'duration': 3.543}, {'end': 4581.535, 'text': "So I take another value, and for this, let's just say I give the value edureka.", 'start': 4574.129, 'duration': 7.406}, {'end': 4590.362, 'text': 'All right, so our dictionary is done over here, so we have created our data frame, guys.', 'start': 4581.675, 'duration': 8.687}, {'end': 4596.01, 'text': "There's no error, and now when I print this, So we have our data frame guys.", 'start': 4590.902, 'duration': 5.108}, {'end': 4602.351, 'text': 'So a B C D E and F so we have all these values using different data types or we can call it objects as well.', 'start': 4596.03, 'duration': 6.321}, {'end': 4609.754, 'text': "So for that also we can check the data frame and we write D types and it's going to give us all the data types that we have.", 'start': 4602.672, 'duration': 7.082}, {'end': 4615.895, 'text': 'So we have date time stamps over here integer float integer category and an object because I have used a string over here.', 'start': 4610.274, 'duration': 5.621}, {'end': 4620.517, 'text': "That's why it is giving us an object but in the new release that is Python 1.0.0.", 'start': 4616.676, 'duration': 3.841}, {'end': 4622.457, 'text': "It's not going to be an object.", 'start': 4620.517, 'duration': 1.94}, {'end': 4624.238, 'text': "It's going to show you it is a string.", 'start': 4622.477, 'duration': 1.761}, {'end': 4633.096, 'text': "So don't worry, guys, and we have already made a video on python pandas 1.0.0 with all the features that have come with the new release,", 'start': 4625.054, 'duration': 8.042}, {'end': 4635.037, 'text': 'the new stable release released last month.', 'start': 4633.096, 'duration': 1.941}], 'summary': 'Creating series and data frames using pandas and numpy with examples and data types.', 'duration': 432.453, 'max_score': 4202.584, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA4202584.jpg'}, {'end': 4590.362, 'src': 'embed', 'start': 4555.876, 'weight': 2, 'content': [{'end': 4559.817, 'text': 'Now instead of test and train, we can just call it as true or false.', 'start': 4555.876, 'duration': 3.941}, {'end': 4561.478, 'text': "Doesn't matter.", 'start': 4560.977, 'duration': 0.501}, {'end': 4569.666, 'text': 'we are taking categorical, so it has to be either true or false, or it can be zero or one,', 'start': 4561.478, 'duration': 8.188}, {'end': 4573.209, 'text': 'but it has to be decisive in a way that there are only two values.', 'start': 4569.666, 'duration': 3.543}, {'end': 4581.535, 'text': "So I take another value, and for this, let's just say I give the value edureka.", 'start': 4574.129, 'duration': 7.406}, {'end': 4590.362, 'text': 'All right, so our dictionary is done over here, so we have created our data frame, guys.', 'start': 4581.675, 'duration': 8.687}], 'summary': "Data frame created with categorical values true, false, and 'edureka'.", 'duration': 34.486, 'max_score': 4555.876, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA4555876.jpg'}, {'end': 5041.922, 'src': 'embed', 'start': 5009.432, 'weight': 0, 'content': [{'end': 5013.415, 'text': "So it's going to give me the values accordingly which I pass over here.", 'start': 5009.432, 'duration': 3.983}, {'end': 5019.119, 'text': 'So instead of a I can write B or I can write D.', 'start': 5013.895, 'duration': 5.224}, {'end': 5031.358, 'text': "So this is how you can select multi-axis using labels guys and I have written this over here so I can just write let's say 0 2 3.", 'start': 5019.119, 'duration': 12.239}, {'end': 5032.08, 'text': "Let's see what happens.", 'start': 5031.36, 'duration': 0.72}, {'end': 5033.52, 'text': 'Oh, we have an error, guys.', 'start': 5032.48, 'duration': 1.04}, {'end': 5034.341, 'text': 'You cannot do this.', 'start': 5033.56, 'duration': 0.781}, {'end': 5041.922, 'text': "So we'll now move on to the next topic that we have is showing label slicing.", 'start': 5037.241, 'duration': 4.681}], 'summary': 'Demonstrating multi-axis selection using labels and encountering an error while attempting to select specific values.', 'duration': 32.49, 'max_score': 5009.432, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA5009432.jpg'}], 'start': 3083.598, 'title': 'Numpy and pandas data manipulation', 'summary': 'Explains numpy array slicing and np.linspace, covering operations like reshaping, finding min/max/sum, mathematical functions, and element-wise operations. it details vertical and horizontal stacking, plotting sine and cosine graphs, and demonstrates data frame creation and manipulation using pandas.', 'chapters': [{'end': 3154.644, 'start': 3083.598, 'title': 'Python array slicing and numpy linspace', 'summary': 'Explains how to perform slicing in numpy arrays to extract specific elements and demonstrates the use of np.linspace to generate equally spaced values within a range, as evidenced by the extraction of 4 and 6 through slicing and the output of 5 equally spaced values between 1 and 3.', 'duration': 71.046, 'highlights': ['The chapter demonstrates the slicing of numpy arrays to extract specific elements, as seen with the extraction of 4 and 6 using slicing.', 'It also showcases the use of np.linspace to generate equally spaced values within a range, exemplified by the output of 5 values between 1 and 3.']}, {'end': 3486.622, 'start': 3154.924, 'title': 'Numpy array operations', 'summary': 'Covers basic numpy array operations such as reshaping, slicing, finding minimum, maximum, and sum, as well as mathematical functions like finding the square root and standard deviation of elements, and element-wise addition, multiplication, subtraction, and division of numpy arrays.', 'duration': 331.698, 'highlights': ['The chapter covers basic numpy array operations such as reshaping, slicing, finding minimum, maximum, and sum. It demonstrates reshaping and slicing of numpy arrays and shows practical examples of finding the minimum, maximum, and sum of numpy arrays.', 'Mathematical functions like finding the square root and standard deviation of elements are explained. It illustrates how to find the square root and standard deviation of elements in numpy arrays using practical examples.', 'Element-wise addition, multiplication, subtraction, and division of numpy arrays are demonstrated. It explains how to perform element-wise addition, multiplication, subtraction, and division of numpy arrays with practical examples.']}, {'end': 3741.484, 'start': 3487.203, 'title': 'Numpy array operations', 'summary': 'Details element-wise addition, subtraction, multiplication, and division using numpy arrays, as well as vertical and horizontal stacking and plotting sine and cosine graphs using matplotlib.', 'duration': 254.281, 'highlights': ['Performing addition, subtraction, multiplication, and division using NumPy arrays with examples of 1+1=2, 2+2=4, 3+3=6, 1-1=0, 2-2=0, 3-3=0, 1*1=1, 2*2=4, 3*3=9, 1/1=1, 2/2=1, 3/3=1.', 'Demonstrating vertical and horizontal stacking of arrays, showcasing np.vstack and np.hstack with clear examples.', 'Converting a NumPy array to a single column using the np.ravel function, illustrated with a specific example.', 'Plotting sine and cosine graphs using matplotlib, with a demonstration of defining coordinates, plotting the graph, and displaying the result using plot.show.']}, {'end': 4391.811, 'start': 3741.904, 'title': 'Python pandas: data manipulation and analysis', 'summary': 'Covers the usage of numpy y for exponential and logarithmic functions, the installation of python pandas for data manipulation, and the creation of series and data frames using pandas, with a focus on the practical application of pandas in data analysis and manipulation.', 'duration': 649.907, 'highlights': ['The chapter starts with the demonstration of numpy y functionalities for exponential and logarithmic functions, showcasing the calculation of exponential values for a given numpy array and the distinction between natural log and log base 10. demonstration of numpy y functionalities, calculation of exponential values, distinction between natural log and log base 10', 'The importance and practical applications of python pandas in data manipulation and analysis are emphasized, including its suitability for working with different kinds of data such as tabular data, time series data, and statistical data sets. importance and practical applications of python pandas, suitability for different kinds of data', 'The process of installing python pandas on various platforms is detailed, including using the command line, terminal, and anaconda prompt, with a strong emphasis on the integral role of pandas in data-related projects and the ease of installation. process of installing python pandas, emphasis on the integral role of pandas, ease of installation', 'The creation of data frames and series using pandas is demonstrated, covering the creation of a series with labeled indexes and the creation of a data frame using a dictionary object and numpy array, highlighting the practical implementation of pandas in data structuring. creation of data frames and series, practical implementation of pandas in data structuring']}, {'end': 5107.71, 'start': 4392.451, 'title': 'Creating data frames and viewing data in pandas', 'summary': 'Demonstrates creating a data frame by passing a dictionary of objects and showcases the process of viewing data using functions like head, tail, describe, sorting, selecting single column or rows, and using labels, with python and pandas.', 'duration': 715.259, 'highlights': ['The chapter demonstrates creating a data frame by passing a dictionary of objects such as lists, timestamps, series, numpy arrays, and categorical objects, used to showcase different data types and objects for creating a data frame.', 'The process of viewing data using functions like head, tail, describe, sorting, and selecting single column or rows, and using labels is explained, providing insights into the practical application of these functions in pandas for data analysis.', 'The demonstration includes using functions like DF.head(), DF.tail(), DF.describe(), DF.sort_by(), DF.loc[] for selecting data by labels, and slicing rows, offering a comprehensive understanding of viewing and manipulating data in pandas.']}], 'duration': 2024.112, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA3083598.jpg', 'highlights': ['The chapter covers basic numpy array operations such as reshaping, slicing, finding minimum, maximum, and sum.', 'Demonstrating vertical and horizontal stacking of arrays, showcasing np.vstack and np.hstack with clear examples.', 'The process of installing python pandas on various platforms is detailed, including using the command line, terminal, and anaconda prompt.', 'The demonstration includes using functions like DF.head(), DF.tail(), DF.describe(), DF.sort_by(), DF.loc[] for selecting data by labels, and slicing rows.']}, {'end': 7760.656, 'segs': [{'end': 5479.7, 'src': 'embed', 'start': 5451.25, 'weight': 3, 'content': [{'end': 5456.793, 'text': 'to run this now, and let us move ahead and to the next topic that we have, which is Pandas operations,', 'start': 5451.25, 'duration': 5.543}, {'end': 5463.517, 'text': 'case on our operations are nothing but a few operations that you can apply on the data frame or any other Pandas object.', 'start': 5456.793, 'duration': 6.724}, {'end': 5470.621, 'text': 'So we have descriptive statistics that we can apply we can apply functions histogramming is there and string methods is also there.', 'start': 5464.057, 'duration': 6.564}, {'end': 5473.743, 'text': "So I'll tell you what histogramming is and we are talking about it.", 'start': 5471.222, 'duration': 2.521}, {'end': 5479.7, 'text': "So let's take it up to Jupiter notebook again guys tell you how you can actually work with pandas operations.", 'start': 5474.278, 'duration': 5.422}], 'summary': 'Pandas operations include descriptive statistics, histogramming, and string methods for data frames.', 'duration': 28.45, 'max_score': 5451.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA5451250.jpg'}, {'end': 6407.538, 'src': 'embed', 'start': 6376.113, 'weight': 0, 'content': [{'end': 6380.295, 'text': 'So this is how we stack or compress a level in the data frames column, guys.', 'start': 6376.113, 'duration': 4.182}, {'end': 6388.486, 'text': 'and with a stack data frame or series having a multi index as the index, inverse, operation of stack is unstack,', 'start': 6381.201, 'duration': 7.285}, {'end': 6393.949, 'text': "by which default it's going to unstack whatever you have done with using stack.", 'start': 6388.486, 'duration': 5.463}, {'end': 6397.812, 'text': "So we'll do df2 dot stack.", 'start': 6394.79, 'duration': 3.022}, {'end': 6403.035, 'text': 'And this is how you unstack.', 'start': 6401.394, 'duration': 1.641}, {'end': 6407.538, 'text': "Now we'll talk about the pivot tables guys or wait.", 'start': 6404.776, 'duration': 2.762}], 'summary': 'Stack and compress data frames column using stack and unstack operations.', 'duration': 31.425, 'max_score': 6376.113, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA6376113.jpg'}, {'end': 6617.69, 'src': 'embed', 'start': 6574.084, 'weight': 4, 'content': [{'end': 6576.486, 'text': 'So this is how it looks now.', 'start': 6574.084, 'duration': 2.402}, {'end': 6579.709, 'text': 'We can pre-use pivot tables from this data very easily guys.', 'start': 6576.646, 'duration': 3.063}, {'end': 6583.832, 'text': 'The very reason of creating this data frame was to get the pivot tables.', 'start': 6580.549, 'duration': 3.283}, {'end': 6593.074, 'text': "Now, what I'll do is I'll just write PD dot Pivot and I'm gonna pass df over here.", 'start': 6584.492, 'duration': 8.582}, {'end': 6617.69, 'text': "all right, df and the values is gonna be, let's say, b and index is equal to a and b and columns, let's say, is equal to c.", 'start': 6593.074, 'duration': 24.616}], 'summary': 'Data frame can easily generate pivot tables using pd dot pivot method from the given data.', 'duration': 43.606, 'max_score': 6574.084, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA6574084.jpg'}, {'end': 6924.343, 'src': 'embed', 'start': 6887.314, 'weight': 1, 'content': [{'end': 6890.475, 'text': "Little change to this only, so I'll just copy this and paste it over here.", 'start': 6887.314, 'duration': 3.161}, {'end': 6896.132, 'text': 'And here, I have to make a few changes, that is, zero, zero.', 'start': 6891.568, 'duration': 4.564}, {'end': 6902.198, 'text': "The rest, everything is gonna be fine, and we'll change this to five.", 'start': 6897.774, 'duration': 4.424}, {'end': 6916.29, 'text': "All right, so now we make another timestamp, and we just use pd.cds, and inside this, I'm gonna use np.random.cds.", 'start': 6906.121, 'duration': 10.169}, {'end': 6924.343, 'text': 'Random number and the random number is gonna be the length of dates.', 'start': 6918.698, 'duration': 5.645}], 'summary': 'Making minor changes, setting some values to zero and others to five, generating a random number for the length of dates.', 'duration': 37.029, 'max_score': 6887.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA6887314.jpg'}], 'start': 5108.09, 'title': 'Pandas data manipulation', 'summary': 'Comprehensively covers data frame operations, handling missing values, applying operations, manipulating and analyzing data using various functions and methods with practical examples, and the importance of data visualization in understanding trends and making better decisions.', 'chapters': [{'end': 5402.643, 'start': 5108.09, 'title': 'Data frame operations & handling missing values', 'summary': 'Covers data frame operations including selecting values, boolean indexing, setting new values, and handling missing data, with examples like selecting values using df.iloc, boolean indexing for a column, and handling missing values by re-indexing and dropping columns with null values.', 'duration': 294.553, 'highlights': ['The chapter covers data frame operations including selecting values, boolean indexing, setting new values, and handling missing data, with examples like selecting values using df.iloc, boolean indexing for a column, and handling missing values by re-indexing and dropping columns with null values.', "Boolean indexing is used to filter data in a data frame based on a specific condition, such as df['a'] > 0, and is important for applying functions to the data frame.", 'Setting new values inside a data frame can be done by aligning the data by indexes, using methods like setting values by label, position, or assigning with a numpy array.', 'Handling missing data involves re-indexing, checking for null values, and dropping columns with any null values, or filling in the missing data.', 'Pandas primarily uses the value NP.nan to represent missing data, and re-indexing allows for changing, adding, or deleting indexes on a specified axis.']}, {'end': 5897.45, 'start': 5402.703, 'title': 'Pandas data frame operations', 'summary': 'Covers checking for missing values, applying operations like descriptive statistics, histogramming, string methods, and merging data frames, showcasing various functions and methods with practical examples.', 'duration': 494.747, 'highlights': ['Descriptive statistics like mean values and applying functions to data frames are demonstrated, including lambda functions for subtraction and various other functions such as sum.', 'Histogramming function is explained as a representation of the distribution of data, and value counts for histogramming are shown.', 'String methods are illustrated in Pandas series for processing each element of the array, showcasing operations like changing case (to upper case) and making changes to string values.', 'Checking for missing values and obtaining a Boolean mask for null values in the data frame is explained using the PD dot isna and DF dot notna methods.', 'Merging two data frames using concat and join functions, and breaking a data frame into pieces for merging is demonstrated.', 'Broadcasting and alignment of objects in Pandas, and automatic broadcasting along the specified dimension are clarified with practical examples.']}, {'end': 6205.031, 'start': 5898.811, 'title': 'Pandas data manipulation', 'summary': 'Covers the use of pandas functions like concat, join, merge, and grouping to manipulate and analyze data, as well as the concepts of stack and pivot table.', 'duration': 306.22, 'highlights': ['The chapter demonstrates the use of the pandas concat function to concatenate multiple pieces of data together, providing an example of how to use the function and its significance.', 'Explanation of left and right join using the merge function, showcasing the process of joining two data frames and handling key errors, emphasizing practical application.', 'Demonstration of grouping in pandas, including the process of splitting data into groups, applying functions, and combining results, with examples of handling key errors.', 'Explanation of the stack function in pandas, detailing its purpose in stacking prescribed levels from columns to index and returning a reshaped data frame or series with a multi-level index, indicating practical application and its significance.']}, {'end': 6850.962, 'start': 6205.031, 'title': 'Pandas data frame and methods', 'summary': 'Covers creating data frames, using stack and unstack methods, pivot tables, time series conversion, and categorical data in pandas, with examples and code snippets.', 'duration': 645.931, 'highlights': ['Pivot tables and time series conversion are covered in detail, with examples and code snippets. Pivot tables, time series conversion, examples, code snippets', 'Explanation on using stack and unstack methods for compressing and uncompressing levels in data frame columns. Stack method, unstack method, compressing levels, uncompressing levels', 'Creation of data frames, including values and index, with examples and code snippets, is explained. Data frame creation, values and index, examples, code snippets', 'Importance of categorical data and its significance in numerical assignments is discussed with examples. Categorical data, numerical assignments, examples']}, {'end': 7760.656, 'start': 6851.242, 'title': 'Pandas data manipulation and visualization', 'summary': 'Covers pandas data manipulation techniques including resampling time series, creating time zone representation, converting between time zones, using categoricals, plotting using pandas, and reading and writing to files, highlighting the need for data visualization in understanding trends and making better decisions.', 'duration': 909.414, 'highlights': ['The chapter covers Pandas data manipulation techniques including resampling time series, creating time zone representation, converting between time zones, using categoricals, plotting using Pandas, and reading and writing to files. Covers various data manipulation techniques in Pandas including resampling time series, creating time zone representation, converting between time zones, using categoricals, plotting using Pandas, and reading and writing to files.', 'It emphasizes the need for data visualization in understanding trends and making better decisions. Emphasizes the importance of data visualization for understanding trends and making better decisions.', 'Data visualization allows quick interpretation of the data and helps in experimenting with different variables for better analysis. Data visualization allows quick interpretation of the data and helps in experimenting with different variables for better analysis.']}], 'duration': 2652.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA5108090.jpg', 'highlights': ['Comprehensively covers data frame operations, handling missing values, applying operations, manipulating and analyzing data using various functions and methods with practical examples, and the importance of data visualization in understanding trends and making better decisions.', 'Descriptive statistics like mean values and applying functions to data frames are demonstrated, including lambda functions for subtraction and various other functions such as sum.', 'Demonstration of grouping in pandas, including the process of splitting data into groups, applying functions, and combining results, with examples of handling key errors.', 'Pivot tables and time series conversion are covered in detail, with examples and code snippets. Pivot tables, time series conversion, examples, code snippets', 'Covers various data manipulation techniques in Pandas including resampling time series, creating time zone representation, converting between time zones, using categoricals, plotting using Pandas, and reading and writing to files.']}, {'end': 10053.948, 'segs': [{'end': 7859.079, 'src': 'embed', 'start': 7823.523, 'weight': 6, 'content': [{'end': 7824.744, 'text': 'Now, this is very important, guys.', 'start': 7823.523, 'duration': 1.221}, {'end': 7830.667, 'text': "Obviously, you won't be selling jackets or sweaters in summers, right? And you won't be selling shaving cream to kids.", 'start': 7825.124, 'duration': 5.543}, {'end': 7837.77, 'text': 'So you should know what your target audience and where you are selling your product and to whom are you selling your product.', 'start': 7831.107, 'duration': 6.663}, {'end': 7842.032, 'text': 'So that is one very important field where we need data visualization.', 'start': 7838.09, 'duration': 3.942}, {'end': 7844.853, 'text': 'And you can even use it to predict sales volume as well.', 'start': 7842.492, 'duration': 2.361}, {'end': 7848.999, 'text': 'Fine guys, so we have looked at a lot of examples for data visualization.', 'start': 7845.413, 'duration': 3.586}, {'end': 7852.855, 'text': 'Now let us understand the diagram that is there in front of your screen.', 'start': 7849.833, 'duration': 3.022}, {'end': 7856.437, 'text': 'So it basically tells you about how to find insights from your data.', 'start': 7852.935, 'duration': 3.502}, {'end': 7859.079, 'text': 'Now for that we need data visualization.', 'start': 7856.777, 'duration': 2.302}], 'summary': 'Data visualization helps understand target audience and predict sales volume.', 'duration': 35.556, 'max_score': 7823.523, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA7823523.jpg'}, {'end': 7987.609, 'src': 'embed', 'start': 7957.926, 'weight': 5, 'content': [{'end': 7963.467, 'text': 'So this is how you find insights in data where visualization plays a very, very important role, guys.', 'start': 7957.926, 'duration': 5.541}, {'end': 7965.328, 'text': 'So I hope you have understood this flow.', 'start': 7963.787, 'duration': 1.541}, {'end': 7969.168, 'text': 'So we have understood what exactly is data visualization.', 'start': 7966.448, 'duration': 2.72}, {'end': 7971.429, 'text': 'If you have any questions or doubts, you can ask me.', 'start': 7969.629, 'duration': 1.8}, {'end': 7978.839, 'text': 'All right, so Neha is asking, Matplotlib is used for data visualization? Yes, Neha, it is used for data visualization.', 'start': 7972.793, 'duration': 6.046}, {'end': 7980.661, 'text': "You're going to see that in the next slide.", 'start': 7979.36, 'duration': 1.301}, {'end': 7987.609, 'text': "Any other questions? All right, so we have no questions, so we'll move forward and understand what exactly is Matplotlib.", 'start': 7981.222, 'duration': 6.387}], 'summary': 'Data visualization is key for finding insights; matplotlib is used for it.', 'duration': 29.683, 'max_score': 7957.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA7957926.jpg'}, {'end': 8279.887, 'src': 'embed', 'start': 8236.199, 'weight': 0, 'content': [{'end': 8241.683, 'text': "So you don't know what is axis, you don't know what is x axis, you don't know what is y axis and what is the graph about.", 'start': 8236.199, 'duration': 5.484}, {'end': 8248.527, 'text': "Plus, in terms of programming, it is very unlikely that you'll be actually filling in data to the plt.plot function.", 'start': 8242.063, 'duration': 6.464}, {'end': 8253.067, 'text': 'Now, over here, you can notice that we are actually filling in data, like 1, 2, 3, 4, 5, 1.', 'start': 8248.566, 'duration': 4.501}, {'end': 8256.171, 'text': 'Instead, you will be passing variables into it, all right?', 'start': 8253.07, 'duration': 3.101}, {'end': 8261.174, 'text': 'So it should be something like plt.plot, and you can say x, y.', 'start': 8256.732, 'duration': 4.442}, {'end': 8263.936, 'text': 'if you have defined variables x, y, all right?', 'start': 8261.174, 'duration': 2.762}, {'end': 8271.199, 'text': 'So now let us show plotting variables, as well as adding some descriptive labels and a good title.', 'start': 8264.692, 'duration': 6.507}, {'end': 8272.66, 'text': 'How does that sound, guys?', 'start': 8271.759, 'duration': 0.901}, {'end': 8275.942, 'text': 'All right, so I can see a lot of people are pretty excited.', 'start': 8273.761, 'duration': 2.181}, {'end': 8279.887, 'text': "so we'll move forward and see how to add title labels and plotting variables.", 'start': 8275.942, 'duration': 3.945}], 'summary': 'Introduction to plotting variables and adding labels in programming.', 'duration': 43.688, 'max_score': 8236.199, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA8236199.jpg'}, {'end': 8327.798, 'src': 'embed', 'start': 8297.86, 'weight': 3, 'content': [{'end': 8304.025, 'text': 'You can put whatever values you want, and this can even contain a data frame, a Pandas data frame that has a data set.', 'start': 8297.86, 'duration': 6.165}, {'end': 8309.03, 'text': "After that, what I've done, I've defined plt.plot function in which I have x and y.", 'start': 8304.566, 'duration': 4.464}, {'end': 8313.773, 'text': "Instead of filling the data, I'm putting in variables that actually contain data.", 'start': 8309.03, 'duration': 4.743}, {'end': 8321.253, 'text': "After that I've added a title which is info then y label which is nothing but y axis and x label which is nothing but x axis.", 'start': 8314.668, 'duration': 6.585}, {'end': 8327.798, 'text': "After that I need to show my plot so for that I've done plt.show and the result is there in front of your screen.", 'start': 8321.913, 'duration': 5.885}], 'summary': 'Using pandas data frame, plt.plot function with x and y variables to display a plot.', 'duration': 29.938, 'max_score': 8297.86, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA8297860.jpg'}, {'end': 8465.097, 'src': 'embed', 'start': 8439.778, 'weight': 16, 'content': [{'end': 8446.744, 'text': 'So you can see that we have a graph that contains the title info, y-axis as the y-label, as well as x-axis as the x-label.', 'start': 8439.778, 'duration': 6.966}, {'end': 8451.168, 'text': "Now I'll again open my slides, and we'll see what next are we going to see.", 'start': 8447.485, 'duration': 3.683}, {'end': 8453.571, 'text': 'So any questions or doubts still here.', 'start': 8452.009, 'duration': 1.562}, {'end': 8455.272, 'text': 'guys, you can ask me any questions?', 'start': 8453.571, 'duration': 1.701}, {'end': 8458.235, 'text': 'So we have no questions.', 'start': 8457.074, 'duration': 1.161}, {'end': 8458.936, 'text': "so what I'll do?", 'start': 8458.235, 'duration': 0.701}, {'end': 8461.017, 'text': "I'll open my slides and we'll move forward.", 'start': 8458.936, 'duration': 2.081}, {'end': 8465.097, 'text': 'Now this graph is pretty much incomplete or ugly, I would say.', 'start': 8461.755, 'duration': 3.342}], 'summary': 'Presentation on graph with incomplete data and no questions.', 'duration': 25.319, 'max_score': 8439.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA8439778.jpg'}, {'end': 8938.407, 'src': 'embed', 'start': 8905.895, 'weight': 8, 'content': [{'end': 8911.178, 'text': 'Then again for one more plot I have plt.bar inside that I have data filled in.', 'start': 8905.895, 'duration': 5.283}, {'end': 8917.302, 'text': "Then label and then in order to differentiate between both of them I'm using a color green for this particular bar graph.", 'start': 8911.659, 'duration': 5.643}, {'end': 8922.424, 'text': "Then comes legend and then after that I've defined x label as well as y label.", 'start': 8917.602, 'duration': 4.822}, {'end': 8925.566, 'text': 'After that I have title and then finally show it.', 'start': 8923.125, 'duration': 2.441}, {'end': 8928.645, 'text': 'Now go ahead and run this and see if it works or not.', 'start': 8926.004, 'duration': 2.641}, {'end': 8930.005, 'text': 'And yes it does.', 'start': 8929.225, 'duration': 0.78}, {'end': 8938.407, 'text': 'So we have legend here, we have x label, we have y label as well as x label, and we have a title for our graph.', 'start': 8931.745, 'duration': 6.662}], 'summary': 'Plotted a bar graph with legend, x label, y label, and title. it works.', 'duration': 32.512, 'max_score': 8905.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA8905895.jpg'}, {'end': 9573.353, 'src': 'embed', 'start': 9546.525, 'weight': 7, 'content': [{'end': 9550.474, 'text': 'This is our Another graph, and this is our first graph.', 'start': 9546.525, 'duration': 3.949}, {'end': 9555.76, 'text': 'So you can notice the difference between both of these graphs, right? So this is why we use subplot.', 'start': 9550.554, 'duration': 5.206}, {'end': 9559.345, 'text': 'So we have certain values in the subplot, right? Two, two, and one.', 'start': 9556.181, 'duration': 3.164}, {'end': 9562.949, 'text': 'Now what this two means that we have two graphs available with us.', 'start': 9559.685, 'duration': 3.264}, {'end': 9566.771, 'text': "After that, whether I'm aligning it horizontally or vertically.", 'start': 9563.71, 'duration': 3.061}, {'end': 9570.132, 'text': "So if I'm doing that horizontally, horizontally I have two graphs.", 'start': 9566.811, 'duration': 3.321}, {'end': 9573.353, 'text': 'Then, among those two graphs, this is my first graph.', 'start': 9570.552, 'duration': 2.801}], 'summary': 'Comparison of two graphs using subplots with 2 available graphs.', 'duration': 26.828, 'max_score': 9546.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA9546525.jpg'}], 'start': 7761.137, 'title': 'Data visualization and matplotlib fundamentals', 'summary': 'Highlights the role of data visualization in decision making, demonstrating its application in finance and insights from data. it also introduces matplotlib, covers its fundamental concepts, different types of plots, customization, and practical execution of plotting graphs.', 'chapters': [{'end': 7978.839, 'start': 7761.137, 'title': 'Role of data visualization in decision making', 'summary': 'Highlights the importance of data visualization in enabling decision makers to understand complex concepts, with examples of its application in finance and identifying areas for improvement, as well as the process of finding insights from data through visualization.', 'duration': 217.702, 'highlights': ['Data visualization enables decision makers to grasp difficult concepts and identify new patterns Data visualization helps decision makers understand complex concepts and identify new patterns, aiding in informed decision making.', 'Application of data visualization in finance to determine investment opportunities Data visualization can be used in finance to identify investment opportunities, aiding in strategic decision making for financial investments.', 'Using data visualization to identify areas for attention or improvement in an organization Data visualization can be utilized to pinpoint areas within an organization that require attention or improvements, facilitating targeted decision making for organizational enhancements.', 'Data visualization to clarify factors influencing customer behavior and predict sales volume Data visualization can clarify the factors influencing customer behavior, aiding in predicting sales volume and influencing strategic marketing decisions.', 'Process of finding insights from data through visualization, including steps of visualization, analysis, documenting insights, and data transformation The process of finding insights from data involves visualization, analysis, documenting insights, and data transformation, ensuring informed decision making through a structured approach.']}, {'end': 8334.883, 'start': 7979.36, 'title': 'Matplotlib fundamentals and basic graphs', 'summary': 'Introduces the fundamental concepts of matplotlib, discusses various types of plots available, and demonstrates the process of creating a basic graph using matplotlib, emphasizing the importance of adding title labels and plotting variables.', 'duration': 355.523, 'highlights': ['The chapter introduces the fundamental concepts of Matplotlib, emphasizing the importance of understanding how Matplotlib works fundamentally and the process of creating a basic graph using Matplotlib.', 'Various types of plots available in Matplotlib are listed, including bar graph, histograms, scatter plot, pie plot, hexagonal bin plot, and area plot.', 'The process of creating a basic graph using Matplotlib, including importing Pyplot, plotting the graph to the canvas, and showing the plot, is demonstrated, resulting in the generation of a simple graph with three lines of code.', 'The importance of adding title labels and plotting variables to the graph is emphasized, and the process of adding title labels and plotting variables is demonstrated, resulting in the addition of a title, x-axis label, and y-axis label to the graph.']}, {'end': 8883.542, 'start': 8336.24, 'title': 'Plotting graphs with matplotlib', 'summary': 'Covers the practical execution of plotting graphs using matplotlib, including customizing graph styles, adding grid lines, labels, and legends, followed by a demonstration of how to plot a bar graph using matplotlib to compare different groups and measure changes over time.', 'duration': 547.302, 'highlights': ['The chapter covers the practical execution of plotting graphs using Matplotlib, including customizing graph styles, adding grid lines, labels, and legends, followed by a demonstration of how to plot a bar graph using Matplotlib to compare different groups and measure changes over time.', 'Bar graphs are used to compare things between different groups and are well-suited for measuring larger changes over time.', 'To plot a bar graph using Matplotlib, the code involves importing PyPlot, using plt.bar with filled-in data or variables, defining labels for each plot, specifying colors, adding a legend, X and Y labels, and a title before displaying the graph.']}, {'end': 9243.605, 'start': 8884.182, 'title': 'Using matplotlib for graphs', 'summary': 'Covers using matplotlib to create bar plots, histograms, scatter plots, and area plots, explaining their differences and practical implementation.', 'duration': 359.423, 'highlights': ['The chapter explains the difference between histogram and bar plot, emphasizing that histograms use quantitative variables while bar plots use categorical variables. Histograms use quantitative variables, while bar plots use categorical variables. For example, bar plots are used to plot the GDP growth of cities, which is categorical data, while histograms are used to analyze the contribution of each age group to GDP growth, which involves quantitative variables.', 'The process of creating a histogram using plt.hist and defining the data, bins, hist type, width, x label, y label, title, legend, and displaying the plot is demonstrated. The process of creating a histogram using plt.hist is detailed, where data, bins, hist type, width, x label, y label, title, legend, and displaying the plot are defined.', 'The explanation of scatter plots, their purpose in comparing two or three variables for correlation, and the practical implementation using plt.scatter is provided. Scatter plots are used to compare variables for correlation, and the practical implementation is shown using plt.scatter, where x and y variables are defined, along with label, color, x label, y label, title, legend, and displaying the plot.', 'The concept and usage of area plots for tracking changes in related groups over time are explained, highlighting their suitability for tracking changes in multiple related groups within a category. Area plots are used to track changes in related groups over time, such as public and private groups, making them suitable for tracking changes in multiple related groups within a category.']}, {'end': 10053.948, 'start': 9243.905, 'title': 'Seaborn data visualization', 'summary': 'Covers various types of plots including stack plots, pie charts, and multiple plots, and then introduces the seaborn library, explaining its features, advantages over matplotlib, dependencies, installation, and various plotting functions, emphasizing the use of relplot to visualize statistical relationships.', 'duration': 810.043, 'highlights': ['The chapter covers various types of plots including stack plots, pie charts, and multiple plots Highlights the different types of plots covered in the chapter.', 'Introduces the Seaborn library, explaining its features, advantages over matplotlib, dependencies, installation, and various plotting functions Provides an overview of the Seaborn library and its various aspects.', 'Emphasizes the use of relplot to visualize statistical relationships Highlights the importance of using relplot for visualizing statistical relationships in Seaborn.']}], 'duration': 2292.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA7761137.jpg', 'highlights': ['Data visualization enables decision makers to grasp difficult concepts and identify new patterns, aiding in informed decision making.', 'Application of data visualization in finance to determine investment opportunities, aiding in strategic decision making for financial investments.', 'Using data visualization to identify areas for attention or improvement in an organization, facilitating targeted decision making for organizational enhancements.', 'Data visualization can clarify the factors influencing customer behavior, aiding in predicting sales volume and influencing strategic marketing decisions.', 'The process of finding insights from data involves visualization, analysis, documenting insights, and data transformation, ensuring informed decision making through a structured approach.', 'The chapter introduces the fundamental concepts of Matplotlib, emphasizing the importance of understanding how Matplotlib works fundamentally and the process of creating a basic graph using Matplotlib.', 'Various types of plots available in Matplotlib are listed, including bar graph, histograms, scatter plot, pie plot, hexagonal bin plot, and area plot.', 'The process of creating a basic graph using Matplotlib, including importing Pyplot, plotting the graph to the canvas, and showing the plot, is demonstrated, resulting in the generation of a simple graph with three lines of code.', 'The importance of adding title labels and plotting variables to the graph is emphasized, and the process of adding title labels and plotting variables is demonstrated, resulting in the addition of a title, x-axis label, and y-axis label to the graph.', 'The chapter covers the practical execution of plotting graphs using Matplotlib, including customizing graph styles, adding grid lines, labels, and legends, followed by a demonstration of how to plot a bar graph using Matplotlib to compare different groups and measure changes over time.', 'Bar graphs are used to compare things between different groups and are well-suited for measuring larger changes over time.', 'The chapter explains the difference between histogram and bar plot, emphasizing that histograms use quantitative variables while bar plots use categorical variables.', 'The process of creating a histogram using plt.hist and defining the data, bins, hist type, width, x label, y label, title, legend, and displaying the plot is demonstrated.', 'The explanation of scatter plots, their purpose in comparing two or three variables for correlation, and the practical implementation using plt.scatter is provided.', 'The concept and usage of area plots for tracking changes in related groups over time are explained, highlighting their suitability for tracking changes in multiple related groups within a category.', 'The chapter covers various types of plots including stack plots, pie charts, and multiple plots.', 'Introduces the Seaborn library, explaining its features, advantages over matplotlib, dependencies, installation, and various plotting functions.', 'Emphasizes the use of relplot to visualize statistical relationships.']}, {'end': 13210.122, 'segs': [{'end': 10081.824, 'src': 'embed', 'start': 10054.308, 'weight': 1, 'content': [{'end': 10058.17, 'text': "To achieve this, we'll be using the catplot function available in Seaborn.", 'start': 10054.308, 'duration': 3.862}, {'end': 10067.216, 'text': 'Like relplot, catplot is also a figure level function characterized by three families of access level functions, which are scatterplots,', 'start': 10058.911, 'duration': 8.305}, {'end': 10069.257, 'text': 'distribution plots or estimate plots.', 'start': 10067.216, 'duration': 2.041}, {'end': 10074.899, 'text': 'Okay Now, let me jump on to my Jupiter notebook and show you all a few examples regarding this.', 'start': 10070.456, 'duration': 4.443}, {'end': 10081.824, 'text': "Okay So here I'll use the same tips data set and since I've already stored it in the variable B.", 'start': 10075.82, 'duration': 6.004}], 'summary': "Using seaborn's catplot function for figure level plots in jupyter notebook.", 'duration': 27.516, 'max_score': 10054.308, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA10054308.jpg'}, {'end': 10600.844, 'src': 'embed', 'start': 10564.894, 'weight': 12, 'content': [{'end': 10566.136, 'text': "Then I'll use a box and plot.", 'start': 10564.894, 'duration': 1.242}, {'end': 10570.917, 'text': "or I've used the boxing plot.", 'start': 10569.736, 'duration': 1.181}, {'end': 10583.186, 'text': "So this time I'll use the box plot function and I'll specify x-axis to be the days and sorry, it's day not days and the y-axis to be the total bill.", 'start': 10570.937, 'duration': 12.249}, {'end': 10590.892, 'text': 'And data is nothing but a sorry guys.', 'start': 10586.949, 'duration': 3.943}, {'end': 10592.373, 'text': "I've made a spelling mistake of here.", 'start': 10590.932, 'duration': 1.441}, {'end': 10593.594, 'text': "It's color codes.", 'start': 10592.853, 'duration': 0.741}, {'end': 10600.844, 'text': 'Okay So here as you can see my plot has access all around it.', 'start': 10595.075, 'duration': 5.769}], 'summary': 'Using box plot function, x-axis = days, y-axis = total bill.', 'duration': 35.95, 'max_score': 10564.894, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA10564894.jpg'}, {'end': 10796.74, 'src': 'embed', 'start': 10771.251, 'weight': 0, 'content': [{'end': 10776.873, 'text': 'Second category is quantitative data, where, as the name suggests, we are talking about something measurable,', 'start': 10771.251, 'duration': 5.622}, {'end': 10778.714, 'text': "because we're talking about in terms of quantity.", 'start': 10776.873, 'duration': 1.841}, {'end': 10786.497, 'text': 'So quantitative data deals with numbers, things you can measure, dimensions such as height, width, length, temperature of room,', 'start': 10779.014, 'duration': 7.483}, {'end': 10789.498, 'text': 'temperature of a country, temperature of an area, humidity like that.', 'start': 10786.497, 'duration': 3.001}, {'end': 10795.279, 'text': 'So further, if you go into detail of qualitative, you can broadly divide qualitative data into two types.', 'start': 10789.998, 'duration': 5.281}, {'end': 10796.74, 'text': 'One is called nominal data.', 'start': 10795.599, 'duration': 1.141}], 'summary': 'Quantitative data deals with measurable numbers and dimensions.', 'duration': 25.489, 'max_score': 10771.251, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA10771251.jpg'}, {'end': 11430.972, 'src': 'embed', 'start': 11399.946, 'weight': 8, 'content': [{'end': 11408.428, 'text': 'now, if you try to relate the revision time and iq with the exam score, you will see a linear relationship between these two variables.', 'start': 11399.946, 'duration': 8.482}, {'end': 11414.109, 'text': 'that means the more the revision time, the better the exam score, the less revision time, the less the exam score.', 'start': 11408.428, 'duration': 5.681}, {'end': 11415.129, 'text': 'so what they have done?', 'start': 11414.109, 'duration': 1.02}, {'end': 11416.809, 'text': 'they have divided the group.', 'start': 11415.129, 'duration': 1.68}, {'end': 11419.89, 'text': 'there are 100 students and the group is divided into two.', 'start': 11416.809, 'duration': 3.081}, {'end': 11420.93, 'text': 'one is having 50,', 'start': 11419.89, 'duration': 1.04}, {'end': 11430.972, 'text': "another is having 50 students and the instructions given to the group number one is that they have to revise for 20 hours and the group two students shouldn't be allowed to do any revision.", 'start': 11420.93, 'duration': 10.042}], 'summary': 'Increased revision time leads to better exam scores, as seen in a study with 100 students divided into two groups: 20-hour revision vs. no revision.', 'duration': 31.026, 'max_score': 11399.946, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA11399946.jpg'}, {'end': 12465.279, 'src': 'embed', 'start': 12432.863, 'weight': 4, 'content': [{'end': 12437.824, 'text': 'Then we need to calculate the difference of each point with the mean value, square it, and then sum it all.', 'start': 12432.863, 'duration': 4.961}, {'end': 12439.524, 'text': "So that's what we are doing in this step.", 'start': 12438.084, 'duration': 1.44}, {'end': 12441.045, 'text': 'We are squaring each individual value.', 'start': 12439.544, 'duration': 1.501}, {'end': 12442.525, 'text': 'Last step, we are summing it all.', 'start': 12441.285, 'duration': 1.24}, {'end': 12445.726, 'text': 'And then we are dividing the n minus 1 sample or total sample.', 'start': 12442.825, 'duration': 2.901}, {'end': 12448.986, 'text': "And then you'll get a result that means the square difference is 8.9.", 'start': 12446.146, 'duration': 2.84}, {'end': 12456.648, 'text': "And in order to get the standard deviation, you'll get as 2.983 is the standard deviation of the whole data set.", 'start': 12448.986, 'duration': 7.662}, {'end': 12465.279, 'text': 'So in order to represent the same thing in R, we already saw, we take the sample values and then we create a histogram in order to represent that.', 'start': 12457.472, 'duration': 7.807}], 'summary': 'Calculation results in a standard deviation of 2.983 for the data set.', 'duration': 32.416, 'max_score': 12432.863, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA12432863.jpg'}, {'end': 12508.921, 'src': 'embed', 'start': 12474.033, 'weight': 11, 'content': [{'end': 12475.614, 'text': 'So this is the initial histogram.', 'start': 12474.033, 'duration': 1.581}, {'end': 12476.714, 'text': 'Then we can put the mean value.', 'start': 12475.634, 'duration': 1.08}, {'end': 12479.216, 'text': "In order to print the mean value, we'll print the mean variable.", 'start': 12476.955, 'duration': 2.261}, {'end': 12480.297, 'text': "We'll get the mean value.", 'start': 12479.436, 'duration': 0.861}, {'end': 12481.557, 'text': "We'll print the medium value.", 'start': 12480.497, 'duration': 1.06}, {'end': 12484.179, 'text': 'And then we define the function for getting a mode value.', 'start': 12481.857, 'duration': 2.322}, {'end': 12487.14, 'text': 'Okay And then we store the value in the result function.', 'start': 12484.359, 'duration': 2.781}, {'end': 12488.041, 'text': 'Result variable.', 'start': 12487.341, 'duration': 0.7}, {'end': 12489.522, 'text': "We'll print the mode.", 'start': 12488.241, 'duration': 1.281}, {'end': 12490.803, 'text': "We'll get the mode values like this.", 'start': 12489.582, 'duration': 1.221}, {'end': 12496.246, 'text': "Right So that's how we can actually do the same stuff for calculating mean, median, mode in R.", 'start': 12491.123, 'duration': 5.123}, {'end': 12498.307, 'text': 'Okay And at the same time, we can draw the graph as well.', 'start': 12496.246, 'duration': 2.061}, {'end': 12501.318, 'text': 'So, in order to calculate the range, as I told you,', 'start': 12498.817, 'duration': 2.501}, {'end': 12508.921, 'text': 'most of the languages supports the calculation of maximum value and the minimum value to the whole data set point directly with the help of some of the APIs.', 'start': 12501.318, 'duration': 7.603}], 'summary': 'The transcript discusses calculating mean, median, and mode in r, while also mentioning drawing a graph and calculating the range using built-in apis.', 'duration': 34.888, 'max_score': 12474.033, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA12474033.jpg'}, {'end': 12542.246, 'src': 'embed', 'start': 12509.121, 'weight': 5, 'content': [{'end': 12511.922, 'text': 'So those APIs are max and min here in this case.', 'start': 12509.121, 'duration': 2.801}, {'end': 12514.944, 'text': 'So max data minus min data will give you a range.', 'start': 12512.323, 'duration': 2.621}, {'end': 12518.625, 'text': 'And if you wanted to calculate the variance of that value, just call a var function.', 'start': 12515.084, 'duration': 3.541}, {'end': 12520.466, 'text': 'It will give you the variance of the whole data set point.', 'start': 12518.825, 'duration': 1.641}, {'end': 12525.42, 'text': 'All right, so another important term in this course is information gain and entropy.', 'start': 12521.098, 'duration': 4.322}, {'end': 12530.362, 'text': 'So, before we go into the details of information, gain and entropy,', 'start': 12525.72, 'duration': 4.642}, {'end': 12535.864, 'text': 'first of all you need to understand what is mean by purity or what is mean by impurity.', 'start': 12530.362, 'duration': 5.502}, {'end': 12542.246, 'text': "So, if you're from the scientific background, you might have heard this term called entropy in case of physics,", 'start': 12535.924, 'duration': 6.322}], 'summary': 'Apis calculate range, variance; introduces information gain, entropy, purity, entropy in physics.', 'duration': 33.125, 'max_score': 12509.121, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA12509121.jpg'}, {'end': 13167.335, 'src': 'embed', 'start': 13139.732, 'weight': 9, 'content': [{'end': 13143.334, 'text': 'I think I would definitely start out with Python and then probably consider our.', 'start': 13139.732, 'duration': 3.602}, {'end': 13145.095, 'text': "well, that's this person preference.", 'start': 13143.334, 'duration': 1.761}, {'end': 13148.617, 'text': "So moving on let's check out what probability actually means.", 'start': 13145.695, 'duration': 2.922}, {'end': 13152.58, 'text': 'Well at the most basic level probability seeks to answer the question.', 'start': 13148.997, 'duration': 3.583}, {'end': 13154.401, 'text': 'What is the chance of an event happening?', 'start': 13152.74, 'duration': 1.661}, {'end': 13158.806, 'text': 'right?. We look at it this way an event is some outcome out of interest.', 'start': 13154.401, 'duration': 4.405}, {'end': 13160.928, 'text': 'to calculate the chance of an event happening', 'start': 13158.806, 'duration': 2.122}, {'end': 13167.335, 'text': 'We need to also consider all the other events that can occur at that point of time as well, right? Well to elaborate more on this.', 'start': 13161.248, 'duration': 6.087}], 'summary': 'Python is a good starting point. probability answers chance of event occurrence.', 'duration': 27.603, 'max_score': 13139.732, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA13139732.jpg'}], 'start': 10054.308, 'title': 'Data visualization and analysis', 'summary': 'Covers visualizing data with seaborn, types of qualitative and quantitative data, representation of data and variables, non-experimental research and sampling techniques, understanding data spread and variability, and statistics in r and decision trees.', 'chapters': [{'end': 10748.468, 'start': 10054.308, 'title': 'Visualizing data with seaborn', 'summary': "Demonstrates the usage of seaborn's catplot, distplot, jointplot, facetgrid, pairgrid functions to visualize data through scatter plots, violin plots, histograms, and color palettes, and also explains the concepts of qualitative and quantitative data.", 'duration': 694.16, 'highlights': ["Seaborn provides functions like catplot, distplot, jointplot, facet grid, and pair grid to visualize data through scatter plots, violin plots, histograms, and color palettes. The chapter demonstrates the usage of Seaborn's catplot, distplot, jointplot, facetgrid, pairgrid functions to visualize data through scatter plots, violin plots, histograms, and color palettes.", 'The chapter explains the concepts of qualitative and quantitative data. The chapter explains the concepts of qualitative and quantitative data.', 'The definition of data is discussed, which includes facts and statistics collected for reference or analysis, varying based on the organization or field. The chapter discusses the definition of data as facts and statistics collected for reference or analysis, varying based on the organization or field.']}, {'end': 11077.394, 'start': 10748.468, 'title': 'Types of data: qualitative and quantitative', 'summary': 'Discusses the distinction between qualitative and quantitative data, covering nominal and ordinal data under qualitative, and discrete and continuous data under quantitative, with examples and characteristics.', 'duration': 328.926, 'highlights': ['Qualitative data includes nominal and ordinal data, representing characteristics with no intrinsic ordering and with inherent ordering, respectively, exemplifying gender and customer rating. Qualitative data is divided into nominal data, representing characteristics without intrinsic ordering, and ordinal data, representing characteristics with inherent ordering, as seen in examples of gender and customer rating.', 'Quantitative data encompasses discrete and continuous data, with discrete data having specific defined values and continuous data allowing further divisions and increments, illustrated through examples of product numbers and patient weight. Quantitative data is classified into discrete data, with specific defined values, and continuous data, allowing further divisions and increments, as shown in examples of product numbers and patient weight.']}, {'end': 11441.185, 'start': 11077.394, 'title': 'Types of data and variables', 'summary': 'Discusses the representation of weight in different units, continuous data, different distributions related to discrete and continuous data, and the categorization of variables into categorical, control, independent, and confounding variables, with a focus on their significance and examples.', 'duration': 363.791, 'highlights': ['The chapter discusses the representation of weight in different units and the concept of continuous data, which can be measured on a continuum or a scale, leading to different distributions related to discrete and continuous data sets. Weight representation in different units and the concept of continuous data, leading to different distributions related to discrete and continuous data sets.', 'Categorization of variables into categorical, control, independent, and confounding variables is explained, with examples and significance highlighted. Explanation of categorization of variables into categorical, control, independent, and confounding variables, with highlighted examples and significance.', 'A practical example of experimental research involving manipulation of independent variables and examination of their effects on dependent variables, with the focus on the linear relationship between revision time, IQ, and exam scores. Practical example of experimental research involving manipulation of independent variables and examination of their effects on dependent variables, focusing on the linear relationship between revision time, IQ, and exam scores.']}, {'end': 12206.869, 'start': 11441.185, 'title': 'Non-experimental research and sampling techniques', 'summary': 'Discusses non-experimental research, the concept of population and sample, and various sampling techniques including probability and non-probability sampling, followed by a brief overview of descriptive statistics and measures of central tendency and spread.', 'duration': 765.684, 'highlights': ['The chapter discusses non-experimental research, where the researcher does not manipulate the independent variable, providing an example related to the effect of illegal recreational drug use on behavior. Non-experimental research, independent variable, dependent variable, unethical nature of manipulating independent variable.', 'The concept of population and sample is explained, highlighting the process of selecting a sample from a population and the significance of choosing a representative sample. Definition of population, need for sampling, importance of representative samples, and sampling techniques.', 'Various sampling techniques such as probability sampling (random, systematic, stratified) and non-probability sampling (convenience, consecutive, quota, judgmental, snowball) are detailed with their respective characteristics and applications. Probability and non-probability sampling techniques, characteristics, and examples.', 'Descriptive statistics, measures of central tendency (mean, median, mode) and measures of spread (range, interquartile range, variance, standard deviation) are briefly introduced. Descriptive statistics, measures of central tendency, measures of spread, examples of calculations, and representation in RStudio.']}, {'end': 12490.803, 'start': 12206.869, 'title': 'Understanding data spread and variability', 'summary': 'Explains the concepts of range, quartiles, interquartile range, deviation, variance, and standard deviation, providing examples and calculations to understand the spread and variability of a dataset.', 'duration': 283.934, 'highlights': ['Standard deviation calculation example using a dataset of rose bushes and their flower counts, resulting in a standard deviation of 2.983. The example demonstrates the calculation of standard deviation using a dataset of rose bushes and their respective flower counts, ultimately yielding a standard deviation of 2.983.', 'Explanation of variance calculation and its relation to the mean, involving the calculation of differences, squaring, summing, and division to obtain the variance. The explanation details the process of calculating the variance, including the steps of finding the differences from the mean, squaring the differences, summing the squared differences, and dividing by the sample size to obtain the variance.', 'Definition of interquartile range as the distance between quartiles, with an example illustrating the calculation using specific quartile values. The definition and calculation of the interquartile range are presented, depicting it as the distance between quartiles, with an example showcasing the calculation using specific quartile values.']}, {'end': 13210.122, 'start': 12491.123, 'title': 'Statistics in r and decision trees', 'summary': 'Covers calculating mean, median, mode, and variance in r, using apis like max and min, as well as understanding information gain, entropy, and confusion matrix in the context of decision trees, with a preference for using python for statistics.', 'duration': 718.999, 'highlights': ["Understanding information gain and entropy in the context of decision trees Information gain and entropy are key concepts in decision trees, with the example demonstrating the choice of feature based on maximum information gain, favoring the 'outlook' parameter.", "Calculation of confusion matrix and its performance measures Explaining the confusion matrix and its performance measures like sensitivity and specificity in evaluating the classification algorithm's performance.", 'Preference for using Python for statistics and its integration with other domains Advocating for using Python for statistics due to its versatility in integrating with other domains like image analysis and text mining, providing a richer and more valuable asset.']}], 'duration': 3155.814, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA10054308.jpg', 'highlights': ['Seaborn provides functions like catplot, distplot, jointplot, facet grid, and pair grid to visualize data through scatter plots, violin plots, histograms, and color palettes.', 'Qualitative data includes nominal and ordinal data, representing characteristics with no intrinsic ordering and with inherent ordering, respectively, exemplifying gender and customer rating.', 'Quantitative data encompasses discrete and continuous data, with discrete data having specific defined values and continuous data allowing further divisions and increments, illustrated through examples of product numbers and patient weight.', 'Categorization of variables into categorical, control, independent, and confounding variables is explained, with examples and significance highlighted.', 'A practical example of experimental research involving manipulation of independent variables and examination of their effects on dependent variables, with the focus on the linear relationship between revision time, IQ, and exam scores.', 'The concept of population and sample is explained, highlighting the process of selecting a sample from a population and the significance of choosing a representative sample.', 'Various sampling techniques such as probability sampling (random, systematic, stratified) and non-probability sampling (convenience, consecutive, quota, judgmental, snowball) are detailed with their respective characteristics and applications.', 'Descriptive statistics, measures of central tendency (mean, median, mode) and measures of spread (range, interquartile range, variance, standard deviation) are briefly introduced.', 'Standard deviation calculation example using a dataset of rose bushes and their flower counts, resulting in a standard deviation of 2.983.', 'Explanation of variance calculation and its relation to the mean, involving the calculation of differences, squaring, summing, and division to obtain the variance.', "Understanding information gain and entropy in the context of decision trees Information gain and entropy are key concepts in decision trees, with the example demonstrating the choice of feature based on maximum information gain, favoring the 'outlook' parameter.", "Calculation of confusion matrix and its performance measures Explaining the confusion matrix and its performance measures like sensitivity and specificity in evaluating the classification algorithm's performance.", 'Preference for using Python for statistics and its integration with other domains Advocating for using Python for statistics due to its versatility in integrating with other domains like image analysis and text mining, providing a richer and more valuable asset.']}, {'end': 14895.524, 'segs': [{'end': 13372.375, 'src': 'embed', 'start': 13336.897, 'weight': 18, 'content': [{'end': 13339.68, 'text': 'and it will return an average number of heads across all the trials.', 'start': 13336.897, 'duration': 2.783}, {'end': 13343.163, 'text': 'So the coin toss simulations will give us some interesting results.', 'start': 13340.18, 'duration': 2.983}, {'end': 13344.624, 'text': 'Check out the results on the right side.', 'start': 13343.403, 'duration': 1.221}, {'end': 13350.009, 'text': 'Well first the data confirms that our average number of heads does approach what the probability suggests.', 'start': 13344.964, 'duration': 5.045}, {'end': 13353.232, 'text': 'It should be somewhere around 5 right 50% well, yes.', 'start': 13350.049, 'duration': 3.183}, {'end': 13359.837, 'text': 'Furthermore, this average improves closer with the number of trials, as you can observe here at 5.4, when you simulate for 10 times,', 'start': 13353.692, 'duration': 6.145}, {'end': 13361.339, 'text': 'and eventually we come as close as 4.999.', 'start': 13359.837, 'duration': 1.502}, {'end': 13363.381, 'text': 'right? 4.999 when we simulated a hundred thousand times.', 'start': 13361.339, 'duration': 2.042}, {'end': 13372.375, 'text': 'Well, in 10 trials there is some error, but this error almost disappears when we do it with a hundred thousand trials, right?', 'start': 13366.71, 'duration': 5.665}], 'summary': 'Coin toss simulations show average heads approaching 5, improving with more trials.', 'duration': 35.478, 'max_score': 13336.897, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA13336897.jpg'}, {'end': 14522.081, 'src': 'embed', 'start': 14494.067, 'weight': 9, 'content': [{'end': 14499.449, 'text': 'So you have to call $20 to stay in the hand and if you win the hand you win stick $60.', 'start': 14494.067, 'duration': 5.382}, {'end': 14505.812, 'text': 'So if your expected value is greater than $20, you should call the bet and if it is not you should fold correct.', 'start': 14499.449, 'duration': 6.363}, {'end': 14511.515, 'text': "So let's figure out if you can actually call the bed, right? So let's quickly go back to Collaboratory and check out the code for that.", 'start': 14506.292, 'duration': 5.223}, {'end': 14515.517, 'text': 'Well should we call the bet? Well, we already know that the sample space is 52 cards.', 'start': 14512.075, 'duration': 3.442}, {'end': 14518.079, 'text': 'We have two whole cards right, which is open cards.', 'start': 14515.557, 'duration': 2.522}, {'end': 14522.081, 'text': 'opponent has two whole cards and eventually the community cards, which are opened up for.', 'start': 14518.079, 'duration': 4.002}], 'summary': 'Deciding whether to call $20 bet based on expected value and game details', 'duration': 28.014, 'max_score': 14494.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14494067.jpg'}, {'end': 14573.537, 'src': 'embed', 'start': 14543.891, 'weight': 0, 'content': [{'end': 14548.195, 'text': 'guys, not the total number of cards in the deck with 52, but the total number of cards remains in the deck.', 'start': 14543.891, 'duration': 4.304}, {'end': 14552.619, 'text': "And now let's determine the expected value, right? So there are $60 in the pot.", 'start': 14548.895, 'duration': 3.724}, {'end': 14556.082, 'text': 'So we need to multiply this $60 into the win probability.', 'start': 14553.019, 'duration': 3.063}, {'end': 14558.824, 'text': 'So basically we are trying to find the expected value here as well.', 'start': 14556.122, 'duration': 2.702}, {'end': 14562.667, 'text': "So lastly we'll be printing the expected value and we'll be taking a decision on that.", 'start': 14559.264, 'duration': 3.403}, {'end': 14564.489, 'text': 'So the called amount is 20.', 'start': 14563.068, 'duration': 1.421}, {'end': 14568.072, 'text': "So if the expected amount is greater than or equal to 20, we'll be calling that round.", 'start': 14564.489, 'duration': 3.583}, {'end': 14571.255, 'text': "So if the called amount is less than 20, we'll be folding it as well.", 'start': 14568.392, 'duration': 2.863}, {'end': 14573.537, 'text': 'So let me go ahead and run this code and see what we get.', 'start': 14571.595, 'duration': 1.942}], 'summary': 'Determining expected value for a $60 pot and making decisions based on it.', 'duration': 29.646, 'max_score': 14543.891, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14543891.jpg'}, {'end': 14701.221, 'src': 'embed', 'start': 14674.38, 'weight': 10, 'content': [{'end': 14677.743, 'text': 'We can also look at another example, like a survey being conducted on an Edureka employees.', 'start': 14674.38, 'duration': 3.363}, {'end': 14684.749, 'text': 'The average commute time of an Edureka employee to reach office from home and then going back to the office comes out to be 35 minutes on average.', 'start': 14678.103, 'duration': 6.646}, {'end': 14686.831, 'text': "But it's an average means.", 'start': 14685.149, 'duration': 1.682}, {'end': 14690.35, 'text': 'there are employees which are taking time as 10 minutes also.', 'start': 14687.247, 'duration': 3.103}, {'end': 14692.672, 'text': 'some are taking time as 50 minutes, also one hour also.', 'start': 14690.35, 'duration': 2.322}, {'end': 14695.235, 'text': 'some are taking just 30 minutes 40 minutes also.', 'start': 14692.672, 'duration': 2.563}, {'end': 14701.221, 'text': 'so we found out that we conducted a survey on 100 employees and we found out the average commute time to be 35 minutes.', 'start': 14695.235, 'duration': 5.986}], 'summary': 'Survey of 100 edureka employees showed an average commute time of 35 minutes.', 'duration': 26.841, 'max_score': 14674.38, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14674380.jpg'}, {'end': 14741.168, 'src': 'embed', 'start': 14708.688, 'weight': 1, 'content': [{'end': 14713.294, 'text': "so if you conduct a test on this hypothesis, That's what hypothesis testing is.", 'start': 14708.688, 'duration': 4.606}, {'end': 14719.557, 'text': 'So hypothesis testing, if I frame it, it becomes hypothesis testing is used to confirm your conclusion about the population.', 'start': 14713.594, 'duration': 5.963}, {'end': 14720.838, 'text': 'Through hypothesis testing,', 'start': 14719.957, 'duration': 0.881}, {'end': 14726.801, 'text': 'you can determine whether there is enough evidence to conclude if the hypothesis about the population parameter is true or not.', 'start': 14720.838, 'duration': 5.963}, {'end': 14729.722, 'text': "Now let's understand the null and alternate hypothesis.", 'start': 14727.361, 'duration': 2.361}, {'end': 14733.324, 'text': 'So we will understand this with an example of criminal trial.', 'start': 14730.262, 'duration': 3.062}, {'end': 14741.168, 'text': 'So suppose there is an accused which has done some crime and now judge has to decide whether he has done the crime or he has not done the crime.', 'start': 14734.004, 'duration': 7.164}], 'summary': "Hypothesis testing confirms conclusions about the population. it determines if there's enough evidence to conclude if the hypothesis about the population parameter is true or not.", 'duration': 32.48, 'max_score': 14708.688, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14708688.jpg'}, {'end': 14802.788, 'src': 'embed', 'start': 14744.8, 'weight': 3, 'content': [{'end': 14751.384, 'text': "The first hypothesis is that the accused is innocent, he hasn't done the crime and the second is that the accused has done the crime and he is guilty.", 'start': 14744.8, 'duration': 6.584}, {'end': 14754.686, 'text': 'So you can see here like there are two opposite hypotheses here.', 'start': 14751.784, 'duration': 2.902}, {'end': 14760.609, 'text': 'So the first one is null hypothesis denoted as H0 which is a prevailing belief about the population.', 'start': 14755.226, 'duration': 5.383}, {'end': 14764.011, 'text': 'It states that there is no change or there is no difference in the status quo,', 'start': 14760.729, 'duration': 3.282}, {'end': 14771.095, 'text': "means there is no difference in the situation according to which the accused is innocent and he is just a member of society and he hasn't done anything wrong.", 'start': 14764.011, 'duration': 7.084}, {'end': 14778.879, 'text': 'Now second one is alternate hypothesis denoted as H1, which is a claim that opposes the null hypothesis, which means the accused is innocent.', 'start': 14771.448, 'duration': 7.431}, {'end': 14782.174, 'text': 'which brings a change in the situation, a difference in the situation.', 'start': 14779.312, 'duration': 2.862}, {'end': 14786.277, 'text': 'Now, if you remember the older example, which I gave you of Maggi noodles.', 'start': 14782.614, 'duration': 3.663}, {'end': 14792.841, 'text': 'Now, the status quo here is, if you remember, the inner hypothesis here is that the lead in the food product, means in the Maggi noodles,', 'start': 14786.837, 'duration': 6.004}, {'end': 14794.562, 'text': 'does not exceed the maximum limit.', 'start': 14792.841, 'duration': 1.721}, {'end': 14796.984, 'text': 'That is, it does not exceed the 2.5 ppm.', 'start': 14794.943, 'duration': 2.041}, {'end': 14802.788, 'text': 'And the alternate hypothesis will be, lead in the Maggi noodles does exceed the 2.5 ppm limit.', 'start': 14797.404, 'duration': 5.384}], 'summary': 'Two opposing hypotheses: accused is innocent (h0) vs. accused is guilty (h1), with real-life example of maggi noodles.', 'duration': 57.988, 'max_score': 14744.8, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14744800.jpg'}, {'end': 14853.896, 'src': 'embed', 'start': 14821.562, 'weight': 6, 'content': [{'end': 14823.123, 'text': 'so now can you please tell me which one is?', 'start': 14821.562, 'duration': 1.561}, {'end': 14825.825, 'text': 'another hypothesis will be in the situation like.', 'start': 14823.123, 'duration': 2.702}, {'end': 14827.026, 'text': 'i will give you the option.', 'start': 14825.825, 'duration': 1.201}, {'end': 14832.551, 'text': 'so the first option is like employees, take an average commute time of 35 minutes, and option b is why is take more than 35 minutes?', 'start': 14827.026, 'duration': 5.525}, {'end': 14837.829, 'text': 'So you can pause this video right now and you can answer the question in the comment section below.', 'start': 14833.407, 'duration': 4.422}, {'end': 14840.41, 'text': 'Now, you see, the outcome of the communal trial.', 'start': 14838.409, 'duration': 2.001}, {'end': 14846.533, 'text': 'example is like the null hypothesis accused is innocent and alternate hypothesis that the accused is not innocent.', 'start': 14840.41, 'duration': 6.123}, {'end': 14853.896, 'text': 'So if you see here, the rejection of null hypothesis will be like guilty because that is status quo, there is no change.', 'start': 14846.953, 'duration': 6.943}], 'summary': 'Hypothesis testing involves commute time, with an average of 35 minutes as one option and a longer commute as another option.', 'duration': 32.334, 'max_score': 14821.562, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14821562.jpg'}], 'start': 13210.502, 'title': 'Probability and statistics', 'summary': 'Covers the concept of probability through coin toss simulation with 50% chance of heads or tails, explores normal distribution, z-score, cumulative probability, and practical applications in predicting poker hand probabilities using python, explains probability in poker, and delves into expected value, hypothesis testing, and their practical application.', 'chapters': [{'end': 13402.123, 'start': 13210.502, 'title': 'Coin toss probability', 'summary': 'Covers the concept of probability through coin toss simulation, where the ideal chance of heads or tails is 50%, and as the number of trials increases, the average number of heads approaches 50% with minimal deviation, showcasing the relationship between probability and statistics.', 'duration': 191.621, 'highlights': ['The average number of heads approaches what the probability suggests, reaching 5.4 in 10 trials and 4.999 in a hundred thousand trials, demonstrating the concept of probability through coin toss simulation. The average number of heads approaches what the probability suggests, reaching 5.4 in 10 trials and 4.999 in a hundred thousand trials, showcasing the relationship between probability and the actual outcomes.', 'Probability serves as the framework for making predictions about the occurrence of events, while statistics involves testing these theories and drawing conclusions from the data gathered through probability. Probability serves as the framework for making predictions about the occurrence of events, while statistics involves testing these theories and drawing conclusions from the data gathered through probability.', 'The deviation from the average decreases as the number of trials increases, emphasizing the relationship between probability and statistics in predicting and analyzing outcomes. The deviation from the average decreases as the number of trials increases, emphasizing the relationship between probability and statistics in predicting and analyzing outcomes.']}, {'end': 13978.977, 'start': 13402.651, 'title': 'Understanding probability and statistics', 'summary': 'Explores the concepts of probability and statistics, covering the normal distribution, z-score, cumulative probability, and their practical applications, including a use case of predicting poker hand probabilities using python.', 'duration': 576.326, 'highlights': ['The chapter discusses the concepts of probability and statistics, including the challenges in calculating probabilities for real-world scenarios such as disease development and car component failure. The chapter discusses the challenges in calculating probabilities for real-world scenarios such as disease development and car component failure.', 'Explains the importance of using data and statistics to calculate probabilities and how confidence in the calculated probabilities increases with more data. Explains the importance of using data and statistics to calculate probabilities and how confidence in the calculated probabilities increases with more data.', 'Discusses the complexity of comparing groups of scores, such as wine ratings, due to the range in scores and the nature of the data, leading to the question of how to determine which wine is better than the other. Discusses the complexity of comparing groups of scores, such as wine ratings, due to the range in scores and the nature of the data, leading to the question of how to determine which wine is better than the other.', 'Explains the normal distribution, its qualities, and its significance in probability and statistics, including the concept of mean, frequency, and the central limit theorem. Explains the normal distribution, its qualities, and its significance in probability and statistics, including the concept of mean, frequency, and the central limit theorem.', "Discusses the three Sigma rule and the z-score, explaining their relevance in understanding the distribution of observations and calculating the probability of a data point's distance from the mean. Discusses the three Sigma rule and the z-score, explaining their relevance in understanding the distribution of observations and calculating the probability of a data point's distance from the mean.", 'Presents a practical use case of predicting poker hand probabilities using Python, including the manual calculation and coding to determine the probability of drawing an ace, heart, face card, and specific poker hands. Presents a practical use case of predicting poker hand probabilities using Python, including the manual calculation and coding to determine the probability of drawing an ace, heart, face card, and specific poker hands.']}, {'end': 14397.291, 'start': 13979.597, 'title': 'Probability in poker', 'summary': 'Explains the card dealing process in poker, calculates the probability of specific card combinations, and explores the concepts of mutually exclusive events, non mutually exclusive events, independent events, dependent events, and expected value.', 'duration': 417.694, 'highlights': ['The chapter explains the card dealing process in poker, calculates the probability of specific card combinations, and explores the concepts of mutually exclusive events, non mutually exclusive events, independent events, dependent events, and expected value. The transcript covers the process of dealing cards in poker, calculates the probability of specific card combinations, and explores concepts such as mutually exclusive events, non mutually exclusive events, independent events, dependent events, and expected value.', 'The probability of hitting a flush draw on the river card is around 20%. The probability of hitting a flush draw on the river card is approximately 20%.', 'The probability of hitting a straight draw on the river card is around 17%. The probability of hitting a straight draw on the river card is approximately 17%.', 'The probability of drawing a heart or a club is 50%, while the probability of drawing an ace, a king, or a queen is 23%. The probability of drawing a heart or a club is 50%, and the probability of drawing an ace, a king, or a queen is 23%.', 'The probability of drawing an heart or an ace is 30%, and the probability of drawing a red card or a face card is 61%. The probability of drawing a heart or an ace is 30%, and the probability of drawing a red card or a face card is 61%.', 'The probability of drawing two aces in a row independently is 0.6%. The probability of drawing two aces in a row independently is 0.6%.', 'The probability of drawing two aces in a row is 45%. The probability of drawing two aces in a row is 45%.']}, {'end': 14895.524, 'start': 14397.311, 'title': 'Expected value and hypothesis testing', 'summary': 'Explains the concept of expected value in poker using examples and code, highlighting its importance and application. it then delves into hypothesis testing, providing examples and definitions of null and alternate hypotheses.', 'duration': 498.213, 'highlights': ['The chapter explains the concept of expected value in poker using examples and code Expected value concept in poker is explained with examples and code, demonstrating its practical application.', 'The importance of expected value in poker is emphasized, highlighting its critical role in decision-making Expected value is highlighted as critical in poker, guiding decision-making at the poker table.', 'The concept of hypothesis testing is introduced, with examples and definitions of null and alternate hypotheses Introduction to hypothesis testing is provided, along with clear examples and definitions of null and alternate hypotheses.']}], 'duration': 1685.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA13210502.jpg', 'highlights': ['The chapter covers the concept of probability through coin toss simulation, demonstrating the relationship between probability and actual outcomes.', 'Probability serves as the framework for making predictions about the occurrence of events, while statistics involves testing these theories and drawing conclusions from the data gathered through probability.', 'The deviation from the average decreases as the number of trials increases, emphasizing the relationship between probability and statistics in predicting and analyzing outcomes.', 'The chapter discusses the challenges in calculating probabilities for real-world scenarios such as disease development and car component failure.', 'Explains the importance of using data and statistics to calculate probabilities and how confidence in the calculated probabilities increases with more data.', 'Discusses the complexity of comparing groups of scores, such as wine ratings, due to the range in scores and the nature of the data, leading to the question of how to determine which wine is better than the other.', 'Explains the normal distribution, its qualities, and its significance in probability and statistics, including the concept of mean, frequency, and the central limit theorem.', "Discusses the three Sigma rule and the z-score, explaining their relevance in understanding the distribution of observations and calculating the probability of a data point's distance from the mean.", 'Presents a practical use case of predicting poker hand probabilities using Python, including the manual calculation and coding to determine the probability of drawing specific poker hands.', 'The chapter explains the card dealing process in poker, calculates the probability of specific card combinations, and explores concepts such as mutually exclusive events, non-mutually exclusive events, independent events, dependent events, and expected value.', 'The probability of hitting a flush draw on the river card is approximately 20%.', 'The probability of hitting a straight draw on the river card is approximately 17%.', 'The probability of drawing a heart or a club is 50%, and the probability of drawing an ace, a king, or a queen is 23%.', 'The probability of drawing a heart or an ace is 30%, and the probability of drawing a red card or a face card is 61%.', 'The probability of drawing two aces in a row independently is 0.6%.', 'The probability of drawing two aces in a row is 45%.', 'The chapter explains the concept of expected value in poker using examples and code, demonstrating its practical application.', 'Expected value is highlighted as critical in poker, guiding decision-making at the poker table.', 'Introduction to hypothesis testing is provided, along with clear examples and definitions of null and alternate hypotheses.']}, {'end': 15874.423, 'segs': [{'end': 14934.88, 'src': 'embed', 'start': 14895.524, 'weight': 0, 'content': [{'end': 14899.647, 'text': 'and the claim is about the status quo means always about the status quo.', 'start': 14895.524, 'duration': 4.123}, {'end': 14904.011, 'text': 'so you can use the following rule to formulate the null and alternate hypothesis.', 'start': 14899.647, 'duration': 4.364}, {'end': 14909.875, 'text': 'so the null hypothesis always has the following signs, like equal to, or smaller than, equals to, or greater than equals to,', 'start': 14904.011, 'duration': 5.864}, {'end': 14914.679, 'text': 'and the alternate hypothesis always has the following signs, like not equal to, or greater than, or lesser than.', 'start': 14909.875, 'duration': 4.804}, {'end': 14915.339, 'text': 'keep that in mind.', 'start': 14914.679, 'duration': 0.66}, {'end': 14916.82, 'text': 'this is a very important thing.', 'start': 14915.339, 'duration': 1.481}, {'end': 14918.902, 'text': "now let's see an example of a flip card.", 'start': 14916.82, 'duration': 2.082}, {'end': 14925.394, 'text': 'Suppose the plant claimed that its total valuation in December 2020 was at least $20 billion.', 'start': 14919.511, 'duration': 5.883}, {'end': 14928.116, 'text': 'Here, the claim contains greater than equals to sign.', 'start': 14925.915, 'duration': 2.201}, {'end': 14930.617, 'text': 'So the null hypothesis is the original claim.', 'start': 14928.556, 'duration': 2.061}, {'end': 14934.88, 'text': 'The hypothesis in this case can be formulated as like you can see here.', 'start': 14931.218, 'duration': 3.662}], 'summary': 'Formulate null and alternate hypotheses using equalities and inequalities, illustrated with a $20 billion valuation claim.', 'duration': 39.356, 'max_score': 14895.524, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14895524.jpg'}, {'end': 15071.403, 'src': 'embed', 'start': 15042.054, 'weight': 4, 'content': [{'end': 15044.374, 'text': 'Because 70 is like far away from 20.', 'start': 15042.054, 'duration': 2.32}, {'end': 15046.155, 'text': 'So you can like reject the null hypothesis.', 'start': 15044.374, 'duration': 1.781}, {'end': 15048.915, 'text': 'But suppose the average score of 5 games is like 65.', 'start': 15046.395, 'duration': 2.52}, {'end': 15051.536, 'text': 'In that case, you more likely to believe his claim.', 'start': 15048.915, 'duration': 2.621}, {'end': 15059.658, 'text': 'because it is near to 70 or if it is more than like even 70, like suppose he got 80 or 90, in which case Avelar is like underestimating himself.', 'start': 15051.854, 'duration': 7.804}, {'end': 15061.459, 'text': 'So we can see this graph here.', 'start': 15060.138, 'duration': 1.321}, {'end': 15064.44, 'text': 'So you can say like the status quo is 70.', 'start': 15061.999, 'duration': 2.441}, {'end': 15068.382, 'text': 'Suppose like the score, we can say for the five games, he got a little lower.', 'start': 15064.44, 'duration': 3.942}, {'end': 15071.403, 'text': 'Like we can say around 55, 60 or say 50 we got.', 'start': 15068.842, 'duration': 2.561}], 'summary': 'Reject null hypothesis if average score is far from 70, but more likely to believe with scores near or above 70.', 'duration': 29.349, 'max_score': 15042.054, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA15042054.jpg'}], 'start': 14895.524, 'title': 'Hypothesis testing methods', 'summary': 'Discusses hypothesis formulation, testing, critical value method, and p-value method using examples such as plant valuation, archery scores, and ac unit demand, emphasizing decision-making process based on critical values and acceptance regions. it also covers types of errors in hypothesis testing, including type 1 and type 2 errors, and their implications in real-world scenarios.', 'chapters': [{'end': 14934.88, 'start': 14895.524, 'title': 'Hypothesis formulation and testing', 'summary': "Discusses the formulation of null and alternate hypotheses using specific signs and provides an example of hypothesis testing in the context of a plant's valuation claim in december 2020.", 'duration': 39.356, 'highlights': ['The null hypothesis always contains signs such as equal to, smaller than or equal to, and greater than or equal to, while the alternate hypothesis contains signs such as not equal to, greater than, or lesser than.', 'An example of a plant claiming a valuation of at least $20 billion in December 2020 is used to illustrate the formulation of null and alternate hypotheses, with the null hypothesis being the original claim.']}, {'end': 15463.167, 'start': 14935.08, 'title': 'Hypothesis testing & critical value method', 'summary': 'Explains the concepts of null and alternate hypotheses, hypothesis testing, critical value method, and different types of tests, using examples such as archery scores and ac unit demand, emphasizing the decision-making process based on critical values and acceptance regions.', 'duration': 528.087, 'highlights': ['The chapter provides a detailed explanation of null and alternate hypotheses and their significance in hypothesis testing, emphasizing the use of signs like equals to, greater than, and smaller than for formulating the hypotheses. It explains the process of formulating null and alternate hypotheses, highlighting the significance of signs such as equals to, greater than, and smaller than in differentiating between the two hypotheses.', 'Using the example of archery scores, the chapter illustrates the decision-making process in hypothesis testing based on critical values, where the average score of 20 leads to rejecting the null hypothesis, while a score of 65 or more leads to accepting the claim. It demonstrates the decision-making process in hypothesis testing with the example of archery scores, showcasing how scores below or above specific critical values lead to either rejecting or accepting the null hypothesis.', 'The chapter further discusses the critical value method, acceptance regions, and different types of tests, including two-tailed tests commonly used in healthcare systems and lower/upper tail tests applicable in manufacturing, with detailed examples and implications. It delves into the critical value method, acceptance regions, and various types of tests, providing examples and implications of two-tailed tests in healthcare systems and lower/upper tail tests in manufacturing.', 'An illustrative example of AC unit demand analysis is presented, outlining the formulation of null and alternate hypotheses, the significance of the mean demand, and the decision-making process based on sample means and standard error calculations. It presents an illustrative example of AC unit demand analysis, emphasizing the formulation of null and alternate hypotheses, the importance of mean demand, and the decision-making process based on sample means and standard error calculations.']}, {'end': 15874.423, 'start': 15464.011, 'title': 'Hypothesis testing: critical value vs. p-value methods', 'summary': 'Explains the critical value method and the p-value method for hypothesis testing, using an example with a mean of 350 and standard deviation of 15, demonstrating the critical value to be 380 and the p-value to be 18.02%. it also covers the types of errors in hypothesis testing, including type 1 and type 2 errors, and their implications in real-world scenarios.', 'duration': 410.412, 'highlights': ['The critical value method is demonstrated with a mean of 350 and standard deviation of 15, resulting in a critical value of 380, indicating a 5% probability of error. Mean: 350, Standard Deviation: 15, Critical Value: 380, Probability of Error: 5%', 'The p-value method is exemplified with a mean of 370.16, yielding a p-value of 18.02% compared to a significance level of 5%, leading to the conclusion of failing to reject the null hypothesis. Mean: 370.16, P-Value: 18.02%, Significance Level: 5%, Decision: Fail to Reject Null Hypothesis', 'Explanation of type 1 and type 2 errors in hypothesis testing, emphasizing the significance of concrete evidence and the potential societal impact of erroneous judgments. Type 1 Error: Rejecting Null Hypothesis When True, Type 2 Error: Failing to Reject False Null Hypothesis, Emphasis on Concrete Evidence and Societal Impact']}], 'duration': 978.899, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA14895524.jpg', 'highlights': ['The chapter discusses the formulation of null and alternate hypotheses using signs such as equal to, smaller than or equal to, and greater than or equal to, with examples like plant valuation and AC unit demand.', 'It emphasizes the decision-making process in hypothesis testing based on critical values and acceptance regions, illustrated through examples like archery scores and AC unit demand analysis.', 'The chapter delves into the critical value method, acceptance regions, and different types of tests, providing examples and implications of two-tailed tests in healthcare systems and lower/upper tail tests in manufacturing.', 'It demonstrates the critical value method with a mean of 350, standard deviation of 15, resulting in a critical value of 380, indicating a 5% probability of error.', 'The p-value method is exemplified with a mean of 370.16, yielding a p-value of 18.02% compared to a significance level of 5%, leading to the conclusion of failing to reject the null hypothesis.', 'The chapter explains type 1 and type 2 errors in hypothesis testing, emphasizing the significance of concrete evidence and the potential societal impact of erroneous judgments.']}, {'end': 17153.189, 'segs': [{'end': 16205.38, 'src': 'embed', 'start': 16179.829, 'weight': 2, 'content': [{'end': 16185.751, 'text': 'So therefore, like we can say the sample mean comes out to be 0.39 and sample size is 7,400.', 'start': 16179.829, 'duration': 5.922}, {'end': 16189.673, 'text': 'So we can say that we have taken a sample of a total searches of 7,400.', 'start': 16185.751, 'duration': 3.922}, {'end': 16192.694, 'text': 'So that brings the sample size to 7,400.', 'start': 16189.673, 'duration': 3.021}, {'end': 16195.516, 'text': 'So next step is we need to compute the test statistic.', 'start': 16192.694, 'duration': 2.822}, {'end': 16202.139, 'text': 'So now that we have the required sample parameters, we need to compute the test statistic for the given sample.', 'start': 16196.196, 'duration': 5.943}, {'end': 16205.38, 'text': 'For this, we need the sampling distribution standard deviation as well.', 'start': 16202.519, 'duration': 2.861}], 'summary': 'Sample mean is 0.39 with a sample size of 7,400.', 'duration': 25.551, 'max_score': 16179.829, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA16179829.jpg'}, {'end': 16718.193, 'src': 'embed', 'start': 16688.14, 'weight': 9, 'content': [{'end': 16690.542, 'text': 'So now the question is when t distribution is used.', 'start': 16688.14, 'duration': 2.402}, {'end': 16700.027, 'text': 'The most important use of the t distribution is that you can approximate the value of the standard deviation of the population sigma from the sample standard deviation s.', 'start': 16691.122, 'duration': 8.905}, {'end': 16705.452, 'text': 'However, as the sample size increases more than 30, the t value tends to be equal to the z value.', 'start': 16700.027, 'duration': 5.425}, {'end': 16710.635, 'text': 'Thus, if you want to summarize the decision making in a flow chart, it will look like this.', 'start': 16705.892, 'duration': 4.743}, {'end': 16718.193, 'text': 'So if you look at how the method of making a decision changes if you are using the sample standard deviation instead of the populations.', 'start': 16711.408, 'duration': 6.785}], 'summary': 'T distribution used to approximate standard deviation, t value equals z value for sample size > 30', 'duration': 30.053, 'max_score': 16688.14, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA16688140.jpg'}, {'end': 16829.001, 'src': 'embed', 'start': 16802.831, 'weight': 0, 'content': [{'end': 16809.594, 'text': 'For example, if you are testing a new drug, you would compare its effectiveness to that of the standard available drug.', 'start': 16802.831, 'duration': 6.763}, {'end': 16815.876, 'text': 'So you would take a sample of patients who consume the new drug and compare it to those who consume the standard drug.', 'start': 16810.034, 'duration': 5.842}, {'end': 16817.676, 'text': 'So we will do a test for it.', 'start': 16816.296, 'duration': 1.38}, {'end': 16819.077, 'text': 'We can do it in Excel.', 'start': 16817.796, 'duration': 1.281}, {'end': 16822.218, 'text': 'So Excel is like very heavily used in industry.', 'start': 16819.517, 'duration': 2.701}, {'end': 16824.62, 'text': 'So this is the data set we have.', 'start': 16823.14, 'duration': 1.48}, {'end': 16829.001, 'text': 'So we will be performing the T tests, multiple kinds of T tests in Excel.', 'start': 16825.02, 'duration': 3.981}], 'summary': 'Comparing effectiveness of new drug to standard drug using t tests in excel.', 'duration': 26.17, 'max_score': 16802.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA16802831.jpg'}, {'end': 16916.645, 'src': 'embed', 'start': 16891.39, 'weight': 1, 'content': [{'end': 16897.654, 'text': "like there's a person a, it has a commute time of 27 on day one and a commute time of 23 on day two.", 'start': 16891.39, 'duration': 6.264}, {'end': 16902.998, 'text': 'similarly, for person b, the same time of like, 43 for day one and 42 for day two.', 'start': 16897.654, 'duration': 5.344}, {'end': 16909.78, 'text': 'so now what we have to do is we have to check for like, whether the average time is equivalent to like.', 'start': 16902.998, 'duration': 6.782}, {'end': 16912.422, 'text': 'we have taken the average commute time to be 35.', 'start': 16909.78, 'duration': 2.642}, {'end': 16916.645, 'text': 'so if it is coming 35 or not for the day one and day two and we will find the difference.', 'start': 16912.422, 'duration': 4.223}], 'summary': 'Average commute time of person a is 25 minutes and person b is 42.5 minutes.', 'duration': 25.255, 'max_score': 16891.39, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA16891390.jpg'}, {'end': 17000.186, 'src': 'embed', 'start': 16972.304, 'weight': 8, 'content': [{'end': 16974.804, 'text': 'And output, you can also get the results in a new sheet.', 'start': 16972.304, 'duration': 2.5}, {'end': 16977.585, 'text': 'But other than that, you can also select the output range as well.', 'start': 16975.164, 'duration': 2.421}, {'end': 16979.625, 'text': 'In that way, you will get in the same sheet.', 'start': 16977.905, 'duration': 1.72}, {'end': 16982.786, 'text': 'Output range we can select.', 'start': 16981.626, 'duration': 1.16}, {'end': 16987.147, 'text': 'So I will select column E and F.', 'start': 16983.686, 'duration': 3.461}, {'end': 16989.488, 'text': 'So that is the output range and just click okay.', 'start': 16987.147, 'duration': 2.341}, {'end': 16991.343, 'text': 'So you got the results here.', 'start': 16990.043, 'duration': 1.3}, {'end': 16994.424, 'text': 'So you can see like test is given as paired two sample for means.', 'start': 16991.704, 'duration': 2.72}, {'end': 16999.086, 'text': "And then mean you can see like there's a little difference in mean for variable 1 and 2.", 'start': 16994.784, 'duration': 4.302}, {'end': 17000.186, 'text': 'And just a second.', 'start': 16999.086, 'duration': 1.1}], 'summary': 'Output results in new sheet, select range e and f, compare means for variables 1 and 2.', 'duration': 27.882, 'max_score': 16972.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA16972304.jpg'}, {'end': 17039.39, 'src': 'embed', 'start': 17012.65, 'weight': 5, 'content': [{'end': 17016.331, 'text': 'Also like if it is a one-tailed test or two-tailed test, in that way, p-value is also given.', 'start': 17012.65, 'duration': 3.681}, {'end': 17019.402, 'text': 'so you can see like even in one tail and two tail,', 'start': 17016.781, 'duration': 2.621}, {'end': 17027.605, 'text': 'both the values are larger than 0.05 and like 0.025 as well for two tail and one test in reference to that, and so we can easily say that if,', 'start': 17019.402, 'duration': 8.203}, {'end': 17031.267, 'text': 'comparing the value, the values of one tail and two tail tests are higher,', 'start': 17027.605, 'duration': 3.662}, {'end': 17037.369, 'text': "so we fail to reject the null hypothesis and we don't have enough evidences to reject the null hypothesis.", 'start': 17031.267, 'duration': 6.102}, {'end': 17039.39, 'text': 'so that is the two sample mean test clear.', 'start': 17037.369, 'duration': 2.021}], 'summary': 'Both one-tail and two-tail tests have p-values larger than 0.05, indicating a failure to reject the null hypothesis due to insufficient evidence.', 'duration': 26.74, 'max_score': 17012.65, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA17012650.jpg'}], 'start': 15875.103, 'title': 'Understanding ctr and hypothesis testing in e-commerce', 'summary': 'Discusses the significance of click-through rate (ctr) in e-commerce, introduces a problem statement, discusses hypothesis testing for search ctr, analyzes search ctr data with a sample size of 7,400 and average ctr of 0.39, and covers hypothesis testing methods for industry applications.', 'chapters': [{'end': 15977.739, 'start': 15875.103, 'title': 'Understanding click through rate in e-commerce', 'summary': 'Discusses the significance of click-through rate (ctr) in the e-commerce industry, explaining its definition and relevance, and introduces the industry problem statement to be addressed in the case study.', 'duration': 102.636, 'highlights': ['Click-through rate (CTR) is a crucial business metric in the e-commerce industry, particularly for businesses with a website or application presence.', 'CTR measures the action rate of entities such as banner ads or website homepage, and can be defined as the proportion of successful searches in the context of online search.', 'The relevance of CTR in online marketing is highlighted, emphasizing its importance in measuring the success of searches and actions taken by users.', 'The formula for computing search CTR is explained as the total number of successful searches divided by the total number of searches, providing a clear understanding of its calculation method.', 'The chapter introduces an industry problem statement to be addressed in the case study, setting the context for the upcoming discussion and analysis.']}, {'end': 16140.054, 'start': 15977.739, 'title': 'Hypothesis testing for search ctr at flipkart', 'summary': 'Discusses hypothesis testing to verify the claim of a product manager that the search ctr has increased from 35% to 40%, using a sample of 14 days data and understanding the need for a representative sample.', 'duration': 162.315, 'highlights': ["The product manager claims that the search CTR has increased from 35 to 40 percent, and the hypothesis testing is used to test this claim using a sample of 14 days data. The product manager's claim of a 40% search CTR increase from 35% is being tested using hypothesis testing with a sample of 14 days data.", "Understanding the need for a representative sample, as the data varies across different days, and it is essential to compute the average search CTR from several days' mean CTR. The need for a representative sample is emphasized due to the variation in daily search CTR, making it essential to compute the average from several days' mean CTR.", "Discussing the formulation of null and alternate hypotheses based on the product manager's claim and the variation in the sample search CTR data. Formulating null and alternate hypotheses based on the claim of a 40% search CTR and the variation in the sample search CTR data."]}, {'end': 16352.201, 'start': 16140.655, 'title': 'Search ctr analysis and hypothesis testing', 'summary': 'Discusses the analysis of search click-through rate (ctr) data, including a sample size of 7,400, an average search ctr of 0.39, and the computation of the test statistic for hypothesis testing with a level of significance of 0.05.', 'duration': 211.546, 'highlights': ['The total searches is 7,400 and clicks come out to be 2885, resulting in an average search CTR of 0.39. The chapter provides the total searches and clicks data, which leads to the calculation of the average search CTR.', 'The sample size is 7,400, and the sample mean for the search CTR is 0.39. The sample size and mean for the search CTR are crucial parameters for the hypothesis testing.', 'The level of significance or alpha for the hypothesis testing is set as 0.05. The chapter discusses the determination of the level of significance, essential for the hypothesis testing process.']}, {'end': 16854.916, 'start': 16352.621, 'title': 'Hypothesis testing and t-test in industry', 'summary': 'Discusses the process of finding the critical region and making decisions through both the critical value and p-value approaches, ultimately demonstrating how hypothesis testing and t-test are used in industry, particularly emphasizing the significance of t-test in practical scenarios.', 'duration': 502.295, 'highlights': ["The p-value approach gives the same result as the critical value approach, leading to the conclusion that there is still insufficient evidence to reject the project manager's claims of increased click-through rate to 40%. The p-value approach yields a p-value of 0.079, which is greater than the significance level of 0.05, indicating the failure to reject the null hypothesis and the inability to disprove the project manager's claim.", 'The significance of T-test in practical scenarios is emphasized, as it allows for the approximation of the population standard deviation from the sample standard deviation, providing a practical solution when the population standard deviation is unknown. The T-test is highlighted as a practical approach in industry when the population standard deviation is unknown, providing a method to approximate the population standard deviation from the sample standard deviation.', 'The explanation of two sample mean tests, paired and unpaired, and their practical applications in industry, particularly demonstrating the usage of Excel for conducting multiple T-tests. The distinction between two sample mean tests, paired and unpaired, is explained, showcasing their applications in practical scenarios such as testing the effectiveness of a new drug, and the emphasis on using Excel for conducting T-tests in industry.']}, {'end': 17153.189, 'start': 16854.916, 'title': 'Hypothesis testing methods', 'summary': 'Covers one sample mean test, two sample mean test paired, and two sample mean test unpaired, providing detailed steps and statistical results for hypothesis testing methods.', 'duration': 298.273, 'highlights': ["The chapter explains the process for two sample mean test paired, using commute time data for day one and day two, and demonstrates the steps for performing the test using Excel's analysis tool pack, yielding results that fail to reject the null hypothesis. Commute time data for day one and day two, demonstration using Excel's analysis tool pack, results with p-values higher than 0.05.", "It also covers the two sample mean test unpaired, showcasing the process for comparing commute times for two different locations, and provides the steps for performing the test using Excel's analysis tool pack, yielding results that allow for rejecting the null hypothesis. Comparison of commute times for two different locations, demonstration using Excel's analysis tool pack, results with p-values lower than 0.05.", 'The chapter introduces the two sample proportion test for categorical sample observations with two categories, such as true or false, male or female, and provides examples of such categorical observations. Introduction of two sample proportion test for categorical sample observations with two categories, examples of categorical observations.']}], 'duration': 1278.086, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA15875103.jpg', 'highlights': ['CTR measures the action rate of entities such as banner ads or website homepage, and can be defined as the proportion of successful searches in the context of online search.', 'The chapter introduces an industry problem statement to be addressed in the case study, setting the context for the upcoming discussion and analysis.', "The product manager's claim of a 40% search CTR increase from 35% is being tested using hypothesis testing with a sample of 14 days data.", "The need for a representative sample is emphasized due to the variation in daily search CTR, making it essential to compute the average from several days' mean CTR.", 'The chapter provides the total searches and clicks data, which leads to the calculation of the average search CTR.', 'The sample size and mean for the search CTR are crucial parameters for the hypothesis testing.', 'The level of significance or alpha for the hypothesis testing is set as 0.05.', "The p-value approach yields a p-value of 0.079, which is greater than the significance level of 0.05, indicating the failure to reject the null hypothesis and the inability to disprove the project manager's claim.", 'The T-test is highlighted as a practical approach in industry when the population standard deviation is unknown, providing a method to approximate the population standard deviation from the sample standard deviation.', 'The distinction between two sample mean tests, paired and unpaired, is explained, showcasing their applications in practical scenarios such as testing the effectiveness of a new drug, and the emphasis on using Excel for conducting T-tests in industry.', "Commute time data for day one and day two, demonstration using Excel's analysis tool pack, results with p-values higher than 0.05.", "Comparison of commute times for two different locations, demonstration using Excel's analysis tool pack, results with p-values lower than 0.05.", 'Introduction of two sample proportion test for categorical sample observations with two categories, examples of categorical observations.']}, {'end': 18601.466, 'segs': [{'end': 17239.952, 'src': 'embed', 'start': 17216.107, 'weight': 3, 'content': [{'end': 17224.809, 'text': "and there's one more thing called excel state and for that you can like simply search for excel stat and you will get it for 30 days.", 'start': 17216.107, 'duration': 8.702}, {'end': 17227.169, 'text': 'so you can get a free trial of excel state.', 'start': 17224.809, 'duration': 2.36}, {'end': 17232.57, 'text': 'like it is a 14 tray free trial, you can click here and you can get it added to your excel.', 'start': 17227.169, 'duration': 5.401}, {'end': 17235.071, 'text': 'so i already have it added in my excel.', 'start': 17232.57, 'duration': 2.501}, {'end': 17236.891, 'text': 'so i will just go to excel state here.', 'start': 17235.071, 'duration': 1.82}, {'end': 17239.952, 'text': 'here this is the place where you, where it get added.', 'start': 17237.27, 'duration': 2.682}], 'summary': 'Excel state offers a 30-day free trial, with a 14-day free trial available, and can be easily added to excel.', 'duration': 23.845, 'max_score': 17216.107, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA17216107.jpg'}, {'end': 17369.914, 'src': 'embed', 'start': 17342.5, 'weight': 7, 'content': [{'end': 17346.344, 'text': 'We see that the p-value is 0.309,, which is larger than 0.105..', 'start': 17342.5, 'duration': 3.844}, {'end': 17347.746, 'text': 'So the answer is also given there.', 'start': 17346.344, 'duration': 1.402}, {'end': 17350.028, 'text': 'The difference between the proportion is equal to 0.', 'start': 17347.766, 'duration': 2.262}, {'end': 17352.15, 'text': 'And the difference between the proportion is different from 0.', 'start': 17350.028, 'duration': 2.122}, {'end': 17353.612, 'text': 'That is the alternate hypothesis.', 'start': 17352.15, 'duration': 1.462}, {'end': 17359.878, 'text': 'But as the compared p-value is greater than the significance level, so we cannot reject that hypothesis.', 'start': 17354.092, 'duration': 5.786}, {'end': 17361.22, 'text': 'So we got the answer here.', 'start': 17360.279, 'duration': 0.941}, {'end': 17363.282, 'text': "So now let's move on to the next test.", 'start': 17361.58, 'duration': 1.702}, {'end': 17365.112, 'text': 'The next test is A-B testing.', 'start': 17363.871, 'duration': 1.241}, {'end': 17369.914, 'text': 'So A-B testing is like a direct industry application of the two sample reports and test sample.', 'start': 17365.352, 'duration': 4.562}], 'summary': 'P-value is 0.309, greater than 0.105, cannot reject null hypothesis.', 'duration': 27.414, 'max_score': 17342.5, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA17342500.jpg'}, {'end': 17514.551, 'src': 'embed', 'start': 17485.453, 'weight': 5, 'content': [{'end': 17491.518, 'text': 'Like Electronic Arts wanted to A-B test different versions of its sales page to identify how it could increase sales exponentially.', 'start': 17485.453, 'duration': 6.065}, {'end': 17498.403, 'text': "So I'm highlighting this particular A-B testing example because it shows how a hypothesis and conventional wisdom can blow up in your faces.", 'start': 17491.658, 'duration': 6.745}, {'end': 17504.027, 'text': 'So for instances, most marketers assume that advertising incentive will result in increased sales.', 'start': 17498.823, 'duration': 5.204}, {'end': 17505.869, 'text': "That's often the case, but not this time.", 'start': 17504.247, 'duration': 1.622}, {'end': 17514.551, 'text': 'So the original version of the pre-order page, which you can see here, has offered 20% off on a future purchase for anyone who bought SimCity 5.', 'start': 17506.269, 'duration': 8.282}], 'summary': 'Ea conducted an a-b test on sales page, defying conventional wisdom with 20% off incentive', 'duration': 29.098, 'max_score': 17485.453, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA17485453.jpg'}, {'end': 18120.543, 'src': 'embed', 'start': 18093.523, 'weight': 0, 'content': [{'end': 18096.907, 'text': 'The first query is how to create a database to explain this.', 'start': 18093.523, 'duration': 3.384}, {'end': 18098.969, 'text': 'Let me jump into my sequel workbench.', 'start': 18097.087, 'duration': 1.882}, {'end': 18103.52, 'text': 'This is the interface of my sequel workbench on the left side of the screen.', 'start': 18099.679, 'duration': 3.841}, {'end': 18106.12, 'text': 'We can see the database in the center area.', 'start': 18103.66, 'duration': 2.46}, {'end': 18108.901, 'text': 'We are going to write our queries at the bottom part.', 'start': 18106.36, 'duration': 2.541}, {'end': 18111.801, 'text': 'We are going to get the status of the query that has been executed.', 'start': 18109.041, 'duration': 2.76}, {'end': 18116.702, 'text': "So you might wonder why I'm using my sequel workbench for executing SQL queries.", 'start': 18112.421, 'duration': 4.281}, {'end': 18120.543, 'text': 'Well SQL is a language that is used to communicate with the database.', 'start': 18117.182, 'duration': 3.361}], 'summary': 'Using sequel workbench to execute sql queries for communicating with the database.', 'duration': 27.02, 'max_score': 18093.523, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA18093523.jpg'}, {'end': 18461.699, 'src': 'embed', 'start': 18407.951, 'weight': 1, 'content': [{'end': 18416.875, 'text': "I'm using student table and the student table has student ID first name last name address City and marks as its columns.", 'start': 18407.951, 'duration': 8.924}, {'end': 18428.2, 'text': 'So the syntax for select statement is Select star from student since we have to display all the columns.', 'start': 18417.615, 'duration': 10.585}, {'end': 18429.681, 'text': "I'm using star operator.", 'start': 18428.42, 'duration': 1.261}, {'end': 18431.922, 'text': 'So let me select this entire query.', 'start': 18430.241, 'duration': 1.681}, {'end': 18436.463, 'text': 'So we can see that the entire table has been displayed.', 'start': 18433.962, 'duration': 2.501}, {'end': 18440.025, 'text': 'So this is one of the way in which we can use select statement.', 'start': 18437.024, 'duration': 3.001}, {'end': 18443.847, 'text': 'The. another method is if you want to select a particular column,', 'start': 18440.525, 'duration': 3.322}, {'end': 18453.976, 'text': 'then we have to specify that column name and the syntax for that is So we can see that from the table.', 'start': 18443.847, 'duration': 10.129}, {'end': 18456.597, 'text': 'We have selected only first name and last name.', 'start': 18454.156, 'duration': 2.441}, {'end': 18461.699, 'text': 'Usually select statement is used to select the data from the database.', 'start': 18458.157, 'duration': 3.542}], 'summary': 'Using the select statement to display all columns or specific ones from the student table.', 'duration': 53.748, 'max_score': 18407.951, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA18407951.jpg'}], 'start': 17153.729, 'title': 'Understanding statistical and database concepts', 'summary': 'Covers topics such as conducting two sample proportion test in excel with a p-value of 0.309, understanding a-b testing with a 40% sales increase demonstrated in electronic arts, introduction to sql, and basics of tables and sql queries.', 'chapters': [{'end': 17377.297, 'start': 17153.729, 'title': 'Two sample proportion test', 'summary': 'Explores the process of conducting a two sample proportion test in excel, demonstrating the steps and interpreting the results, with a p-value of 0.309 indicating that we cannot reject the null hypothesis.', 'duration': 223.568, 'highlights': ['The process of conducting a two sample proportion test in Excel The chapter explains the detailed steps of conducting a two sample proportion test in Excel, including selecting the frequency, setting the hypothesis size reference, significance level, and interpreting the results with a p-value of 0.309.', 'Interpreting the results with a p-value of 0.309 The p-value of 0.309 is larger than the significance level, indicating that we cannot reject the null hypothesis, which is further explained in the context of the difference between proportions and the alternate hypothesis.', 'Introduction to A-B testing The chapter introduces A-B testing as a direct industry application of the two sample proportion test, particularly in the context of e-commerce website development and decision-making for elements such as the shape of buttons.']}, {'end': 17921.367, 'start': 17377.297, 'title': 'Understanding a-b testing in excel', 'summary': 'Explains the concept of a-b testing, provides an example of a-b testing in electronic arts, demonstrating a 40% increase in sales through a-b testing, and then goes on to demonstrate a-b testing in excel with a sample dataset, showing how the hypothesis testing was done and how the concepts of hypothesis testing are connected to the industry.', 'duration': 544.07, 'highlights': ['A-B testing in Electronic Arts resulted in a 40% increase in sales through the elimination of a pre-order incentive, demonstrating the importance of A-B testing in understanding audience behavior. The A-B testing example in Electronic Arts showed a 40% increase in sales by eliminating a pre-order incentive, revealing the importance of A-B testing in understanding audience behavior and challenging conventional marketing assumptions.', 'The chapter demonstrates A-B testing in Excel with a sample dataset, showcasing the process of hypothesis testing and its connection to the industry. The chapter provides a demonstration of A-B testing in Excel using a sample dataset, including the process of hypothesis testing and its relevance to the industry, offering a practical understanding of A-B testing in a real-world context.', 'SQL, a core language for managing relational databases, was developed at IBM in the early 1970s and is known for its simplicity and declarative nature. SQL, developed at IBM in the early 1970s, is a core language for managing relational databases, known for its simplicity, use of common English sentences, and declarative nature, making it easy to learn and work with.']}, {'end': 18204.268, 'start': 17921.828, 'title': 'Introduction to data, database, and sql', 'summary': 'Discusses the features of sql, defines data and database, explains different types of databases, and demonstrates basic database queries using mysql workbench.', 'duration': 282.44, 'highlights': ['The chapter explains the features of SQL, emphasizing the ability to execute the same query in different systems with the same setup.', 'It defines data as values from different sources translated for a specific purpose, and explains the concept of database with an analogy to a library.', 'It introduces different types of databases including relational, no sequel, and popular databases like MongoDB, Postgres, and MySQL.', 'The demonstration of basic database queries using MySQL workbench includes creating and deleting a database, providing syntax and examples.', 'The chapter explores the concept of a table in a database as a collection of data in tabular form, consisting of rows and columns.']}, {'end': 18601.466, 'start': 18204.268, 'title': 'Understanding tables and sql queries', 'summary': 'Explains the concept of tables, including their elements such as rows, columns, cells, tuples, attributes, and constraints, and provides an introduction to sql queries, including creating and deleting tables, and using basic sql queries like select, where, and or.', 'duration': 397.198, 'highlights': ['A table can have any number of rows, but should have a specified number of columns to understand the concept of a table in a better way. Emphasizes the importance of columns in understanding the concept of a table.', "A table without a name doesn't exist. Tuples are single rows of a table which contains a single record for that relation. Explains the necessity of a table having a name and introduces the concept of tuples.", 'Attributes features of an entity is called attribute an attribute has a name and a data type here. There are four attributes: ID, first name, last name, and date of birth. Defines attributes and provides specific examples of attributes in the given table.', 'Constraints are a restriction specified by the user while creating a table to ensure data integrity. Explains the purpose of constraints in maintaining data integrity.', 'The chapter also introduces basic SQL queries like creating a table, deleting a table, and using select, where, and or clauses. Summarizes the introduction of basic SQL queries and their functionality.']}], 'duration': 1447.737, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA17153729.jpg', 'highlights': ['A-B testing in Electronic Arts resulted in a 40% increase in sales through the elimination of a pre-order incentive, demonstrating the importance of A-B testing in understanding audience behavior.', 'SQL, developed at IBM in the early 1970s, is a core language for managing relational databases, known for its simplicity, use of common English sentences, and declarative nature, making it easy to learn and work with.', 'The chapter introduces A-B testing as a direct industry application of the two sample proportion test, particularly in the context of e-commerce website development and decision-making for elements such as the shape of buttons.', 'The process of conducting a two sample proportion test in Excel, including selecting the frequency, setting the hypothesis size reference, significance level, and interpreting the results with a p-value of 0.309.', 'The chapter provides a demonstration of A-B testing in Excel using a sample dataset, including the process of hypothesis testing and its relevance to the industry, offering a practical understanding of A-B testing in a real-world context.', 'The chapter explains the features of SQL, emphasizing the ability to execute the same query in different systems with the same setup.', 'The chapter explores the concept of a table in a database as a collection of data in tabular form, consisting of rows and columns.', 'Constraints are a restriction specified by the user while creating a table to ensure data integrity.']}, {'end': 20843.142, 'segs': [{'end': 18654.752, 'src': 'embed', 'start': 18624.768, 'weight': 1, 'content': [{'end': 18630.792, 'text': 'and the syntax for this is insert into is a keyword that is used in insert query, followed by that.', 'start': 18624.768, 'duration': 6.024}, {'end': 18634.574, 'text': 'We have to specify the table name and columns after that.', 'start': 18631.072, 'duration': 3.502}, {'end': 18636.035, 'text': 'We have to specify the values.', 'start': 18634.654, 'duration': 1.381}, {'end': 18640.538, 'text': "Let's look at the example to clearly understand how exactly this query works.", 'start': 18636.576, 'duration': 3.962}, {'end': 18644.12, 'text': 'So this is the example for insert query.', 'start': 18641.779, 'duration': 2.341}, {'end': 18654.752, 'text': 'So we are going to insert first name as Manoj last name as Sharma address as 07 MG Road and City as Jaipur also the marks is 438.', 'start': 18644.861, 'duration': 9.891}], 'summary': 'The insert into query adds a record with specific data into a table, such as inserting a record with first name manoj, last name sharma, address 07 mg road, city jaipur, and marks 438.', 'duration': 29.984, 'max_score': 18624.768, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA18624768.jpg'}, {'end': 18789.8, 'src': 'embed', 'start': 18759.331, 'weight': 8, 'content': [{'end': 18761.211, 'text': 'The next aggregate function is some.', 'start': 18759.331, 'duration': 1.88}, {'end': 18765.923, 'text': 'Usually some function returns the total sum of that particular column.', 'start': 18762, 'duration': 3.923}, {'end': 18771.407, 'text': 'the syntax is select sum of column from table where condition.', 'start': 18765.923, 'duration': 5.484}, {'end': 18774.689, 'text': 'so let me execute this example in this example.', 'start': 18771.407, 'duration': 3.282}, {'end': 18777.631, 'text': "I'm trying to find the total marks code by all the students.", 'start': 18774.929, 'duration': 2.702}, {'end': 18779.513, 'text': 'So let me select this query.', 'start': 18778.312, 'duration': 1.201}, {'end': 18786.858, 'text': 'The total marks code by all the students is 2882.', 'start': 18781.274, 'duration': 5.584}, {'end': 18789.8, 'text': 'Moving on to the next aggregate function that is minimum.', 'start': 18786.858, 'duration': 2.942}], 'summary': 'The total marks scored by all students is 2882.', 'duration': 30.469, 'max_score': 18759.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA18759331.jpg'}, {'end': 18867.359, 'src': 'embed', 'start': 18837.878, 'weight': 0, 'content': [{'end': 18841.739, 'text': 'So we get the output that is Bharat Singh has scored 580 marks.', 'start': 18837.878, 'duration': 3.861}, {'end': 18845.54, 'text': 'So 580 marks is a maximum marks in the entire table.', 'start': 18842.259, 'duration': 3.281}, {'end': 18853.415, 'text': 'Next SQL query is grouped by group by is the functionality used to arrange a similar type of data into a group.', 'start': 18846.934, 'duration': 6.481}, {'end': 18861.557, 'text': 'For instance, if the column in a table consists of similar data or values in different rows, then we can use group by function to group the data.', 'start': 18853.796, 'duration': 7.761}, {'end': 18867.359, 'text': 'the syntax is select column from table where condition group by the same column.', 'start': 18861.557, 'duration': 5.802}], 'summary': 'Bharat singh scored 580 marks, the maximum in the table.', 'duration': 29.481, 'max_score': 18837.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA18837878.jpg'}, {'end': 19059.58, 'src': 'embed', 'start': 19027.926, 'weight': 9, 'content': [{'end': 19031.629, 'text': 'Those are is null and is not null first.', 'start': 19027.926, 'duration': 3.703}, {'end': 19034.591, 'text': "Let's look at the is null operator.", 'start': 19031.769, 'duration': 2.822}, {'end': 19044.379, 'text': 'is null operator is used to test the empty values and the syntax for this is select columns from table where column is null.', 'start': 19034.591, 'duration': 9.788}, {'end': 19046.561, 'text': 'So let me execute this example.', 'start': 19044.94, 'duration': 1.621}, {'end': 19049.823, 'text': "I want to display the student's name whose marks is null.", 'start': 19046.581, 'duration': 3.242}, {'end': 19059.58, 'text': 'So let me select this query and let me execute this Since all the students have marks and none of them has null value in the output.', 'start': 19050.284, 'duration': 9.296}], 'summary': "The 'is null' operator tests for empty values, but in the given example, all students have marks with no null values.", 'duration': 31.654, 'max_score': 19027.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA19027926.jpg'}, {'end': 19530.601, 'src': 'embed', 'start': 19505.763, 'weight': 13, 'content': [{'end': 19512.147, 'text': 'Then the next step is to clean the data from the redundancies now redundancies can be irregularity in the data.', 'start': 19505.763, 'duration': 6.384}, {'end': 19518.411, 'text': 'It can be some variables or some columns that are not necessary for making our conclusions or interpretations,', 'start': 19512.227, 'duration': 6.184}, {'end': 19523.614, 'text': 'so we can just remove them or their outliers, which can cause a noise in the data.', 'start': 19518.411, 'duration': 5.203}, {'end': 19528.637, 'text': 'or you know, it may over fit or under fit the model when we are working on the model building as well.', 'start': 19523.614, 'duration': 5.023}, {'end': 19530.601, 'text': 'So this is the second step guys.', 'start': 19529.161, 'duration': 1.44}], 'summary': 'Clean data to remove redundancies and outliers, ensuring model accuracy.', 'duration': 24.838, 'max_score': 19505.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA19505763.jpg'}, {'end': 19586.899, 'src': 'embed', 'start': 19562.201, 'weight': 5, 'content': [{'end': 19567.983, 'text': "And if you're still looking for shortcuts like if you want to just understand how it really works.", 'start': 19562.201, 'duration': 5.782}, {'end': 19571.985, 'text': 'We have a cheat sheet as well, which you can refer for working at Jupiter notebook.', 'start': 19568.023, 'duration': 3.962}, {'end': 19575.627, 'text': "And if you're looking at installation and everything we have an account tutorial as well.", 'start': 19572.425, 'duration': 3.202}, {'end': 19579.708, 'text': "So the very first thing you have to do is import certain libraries that you're going to need.", 'start': 19576.267, 'duration': 3.441}, {'end': 19583.378, 'text': "So I'm going to import Pandas with an alias PD.", 'start': 19580.149, 'duration': 3.229}, {'end': 19586.899, 'text': "I'm going to import a few other libraries that you may need.", 'start': 19583.858, 'duration': 3.041}], 'summary': 'Learn shortcuts, use cheat sheet, and import pandas for jupiter notebook.', 'duration': 24.698, 'max_score': 19562.201, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA19562201.jpg'}, {'end': 20285.414, 'src': 'embed', 'start': 20249.848, 'weight': 18, 'content': [{'end': 20255.472, 'text': "Now I'm going to put it inside a heat map, guys, using the SNS library that I have, or seaborn library.", 'start': 20249.848, 'duration': 5.624}, {'end': 20259.316, 'text': "SNS is basically the alias that I'm using for importing the library.", 'start': 20255.914, 'duration': 3.402}, {'end': 20266.661, 'text': "So I'm gonna use the heat map to actually show you what it looks like.", 'start': 20259.336, 'duration': 7.325}, {'end': 20285.414, 'text': 'And so we have x tick labels is equal to correlation columns and then we have y tick labels.', 'start': 20267.782, 'duration': 17.632}], 'summary': 'Using sns library to create a heatmap to visualize data.', 'duration': 35.566, 'max_score': 20249.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA20249848.jpg'}, {'end': 20352.415, 'src': 'embed', 'start': 20323.59, 'weight': 7, 'content': [{'end': 20328.371, 'text': 'But since we do not have a lot of integer values, we have a few categorical values as well.', 'start': 20323.59, 'duration': 4.781}, {'end': 20333.212, 'text': 'So it is not necessarily that defining for our data set.', 'start': 20329.051, 'duration': 4.161}, {'end': 20341.393, 'text': "But if we change these two dummy values will be able to get a specific value for this for writing score and let's say for match score.", 'start': 20333.512, 'duration': 7.881}, {'end': 20347.052, 'text': 'There is a variability of 0.8 and then we have a correlation almost everything is same here.', 'start': 20341.433, 'duration': 5.619}, {'end': 20352.415, 'text': "So we'll not rely on this more to the next step for our data set guys.", 'start': 20347.312, 'duration': 5.103}], 'summary': 'Limited integer values, some categorical values, variability of 0.8, and correlation; not relied on for next steps.', 'duration': 28.825, 'max_score': 20323.59, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA20323590.jpg'}, {'end': 20458.31, 'src': 'embed', 'start': 20416.433, 'weight': 10, 'content': [{'end': 20424.419, 'text': "All right, x has to be, let's say, math score.", 'start': 20416.433, 'duration': 7.986}, {'end': 20428.662, 'text': "We'll take y as reading score.", 'start': 20424.799, 'duration': 3.863}, {'end': 20433.166, 'text': "We'll take the hue as gender.", 'start': 20429.263, 'duration': 3.903}, {'end': 20440.872, 'text': 'Data is equal to student.', 'start': 20438.33, 'duration': 2.542}, {'end': 20447.845, 'text': 'All right, so we have a scatter plot here and we have added the hue as well.', 'start': 20444.724, 'duration': 3.121}, {'end': 20454.629, 'text': 'So, as you can see, guys, for all these values that are in blue dots are actually the gender, who are female,', 'start': 20448.086, 'duration': 6.543}, {'end': 20458.31, 'text': 'and the other one is the male counterparts, and they are.', 'start': 20454.629, 'duration': 3.681}], 'summary': "Using x as math score, y as reading score, and hue as gender, a scatter plot was created to compare male and female students' performance.", 'duration': 41.877, 'max_score': 20416.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA20416433.jpg'}], 'start': 18601.786, 'title': 'Sql queries, eda, and data visualization', 'summary': 'Covers sql queries for filtering, inserting, aggregate functions, data exploration including eda, data cleaning, and relationship analysis, and data visualization techniques using scatter plots, histograms, and box plots.', 'chapters': [{'end': 18933.388, 'start': 18601.786, 'title': 'Sql queries and aggregate functions', 'summary': 'Covers sql queries for filtering, inserting, and using aggregate functions such as count, average, sum, min, max, group by, and having, with examples like inserting data, counting rows, finding averages, totals, minimum, and maximum values, grouping data, and using having clause.', 'duration': 331.602, 'highlights': ['The count aggregate function returns the number of rows that match specified criteria, with an example showing the count of the student ID as 6.', 'The average aggregate function returns the average value of a numeric column, with an example showing the average mark scored by all the students as 480.333.', 'The sum aggregate function returns the total sum of a particular column, with an example showing the total marks scored by all the students as 2882.', 'The max aggregate function returns the largest value from the selected column, with an example showing Bharat Singh has scored 580 marks, the maximum in the entire table.', 'The min aggregate function returns the smallest value of the selected column, with an example showing Ashok Sinha has scored 385 marks, the minimum in the entire table.']}, {'end': 19447.393, 'start': 18933.569, 'title': 'Sql queries and data exploration', 'summary': 'Covers sql queries including filtering, sorting, null values, update, delete, in and between operators, aliases, and concludes with an introduction to exploratory data analysis (eda) with emphasis on understanding, cleaning, and deriving insights from data.', 'duration': 513.824, 'highlights': ['The chapter covers SQL queries including filtering, sorting, null values, update, delete, in and between operators, aliases, and concludes with an introduction to exploratory data analysis (EDA) with emphasis on understanding, cleaning, and deriving insights from data. It covers various SQL queries, such as filtering, sorting, null values, update, delete, in and between operators, and aliases, and introduces exploratory data analysis (EDA) with an emphasis on understanding, cleaning, and deriving insights from data.', 'The order by keyword is used to sort the result set in ascending or descending order. The order by keyword is used to sort the result set in ascending or descending order.', 'The in operator is used to specify multiple values inside the where clause, acting as multiple OR conditions. The in operator is used to specify multiple values inside the where clause, acting as multiple OR conditions.', 'The between operator selects a particular value within the specified range and requires adding the beginning and end values. The between operator selects a particular value within the specified range and requires adding the beginning and end values.', 'Exploratory data analysis (EDA) aims to understand, clean, and derive insights from data, ensuring it is free of dependencies and null values. Exploratory data analysis (EDA) aims to understand, clean, and derive insights from data, ensuring it is free of dependencies and null values.']}, {'end': 20013.165, 'start': 19447.713, 'title': 'Exploratory data analysis (eda)', 'summary': 'Introduces the objectives of eda, including identifying and cleaning faulty data points, understanding the relationship between variables, and analyzing the data to make conclusions. the key steps involved in eda include understanding the variables, cleaning redundancies, and analyzing the relationship between variables for a given data set.', 'duration': 565.452, 'highlights': ['Understanding the variables The first step in EDA involves understanding the variables in the data set, including the number of columns, rows, and the nature of the data, which helps in gaining insights into the structure of the data.', 'Cleaning the data from redundancies The second step in EDA is to clean the data from redundancies, irregularities, or outliers that may cause noise in the data, ensuring it does not overfit or underfit the model, ultimately improving data quality and reliability.', 'Analyzing the relationship between variables The final step in EDA involves analyzing the relationship between variables, which provides valuable insights for making conclusions and interpretations in the data analysis process.']}, {'end': 20372.949, 'start': 20013.686, 'title': 'Data cleaning and relationship analysis', 'summary': 'Covers the process of cleaning the data by removing redundant columns and checking for outliers, followed by the analysis of relationships among variables using correlation matrix and pair plot.', 'duration': 359.263, 'highlights': ['The process of cleaning the data involves removing redundant columns such as race, ethnicity, and parental level of education, as they are considered unimportant for evaluation. Reduction of unnecessary columns: race, ethnicity, and parental level of education.', 'Outlier analysis is conducted to identify data points that significantly differ from other observations, which may potentially cause serious problems in statistical analysis. Explanation of outliers and the importance of identifying and handling them to ensure data quality.', "The relationship analysis includes the use of a correlation matrix to summarize the correlation coefficients between variables, providing a wider perspective on the dataset and serving as an input for more advanced analysis. Utilization of correlation matrix to analyze relationships between variables and assess the dataset's characteristics.", 'The creation of a pair plot allows for the visualization of the relationships between two variables, whether they are continuous, categorical, or boolean, providing further insight into the dataset. Utilization of pair plot for visualizing the relationship between two variables in the dataset.']}, {'end': 20843.142, 'start': 20373.504, 'title': 'Data visualization techniques', 'summary': 'Discussed the use of scatter plots, histograms, and box plots to analyze relationships between variables in a small data set, emphasizing the effectiveness of scatter plots and providing insights into the data structure and key variables.', 'duration': 469.638, 'highlights': ['Scatter plot: Used to show the relationship between math score and reading score, providing insights into the relationship between gender, test preparation course, and lunch with the scores, emphasizing the importance of this visualization technique for data analysis.', 'Histogram: Utilized to display the distribution of math, reading, and writing scores, highlighting the concentration of values within specific ranges and its relevance in analyzing data relationships.', 'Box plot: Employed to examine the distribution of math, reading, and writing scores, emphasizing its role in understanding the relationship between variables in the data set.']}], 'duration': 2241.356, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA18601786.jpg', 'highlights': ['The count aggregate function returns the number of rows that match specified criteria, with an example showing the count of the student ID as 6.', 'The sum aggregate function returns the total sum of a particular column, with an example showing the total marks scored by all the students as 2882.', 'The average aggregate function returns the average value of a numeric column, with an example showing the average mark scored by all the students as 480.333.', 'The max aggregate function returns the largest value from the selected column, with an example showing Bharat Singh has scored 580 marks, the maximum in the entire table.', 'The min aggregate function returns the smallest value of the selected column, with an example showing Ashok Sinha has scored 385 marks, the minimum in the entire table.', 'The chapter covers SQL queries including filtering, sorting, null values, update, delete, in and between operators, aliases, and concludes with an introduction to exploratory data analysis (EDA) with emphasis on understanding, cleaning, and deriving insights from data.', 'The order by keyword is used to sort the result set in ascending or descending order.', 'The in operator is used to specify multiple values inside the where clause, acting as multiple OR conditions.', 'The between operator selects a particular value within the specified range and requires adding the beginning and end values.', 'Exploratory data analysis (EDA) aims to understand, clean, and derive insights from data, ensuring it is free of dependencies and null values.', 'Understanding the variables The first step in EDA involves understanding the variables in the data set, including the number of columns, rows, and the nature of the data, which helps in gaining insights into the structure of the data.', 'Cleaning the data from redundancies The second step in EDA is to clean the data from redundancies, irregularities, or outliers that may cause noise in the data, ensuring it does not overfit or underfit the model, ultimately improving data quality and reliability.', 'Analyzing the relationship between variables The final step in EDA involves analyzing the relationship between variables, which provides valuable insights for making conclusions and interpretations in the data analysis process.', 'The process of cleaning the data involves removing redundant columns such as race, ethnicity, and parental level of education, as they are considered unimportant for evaluation.', 'Outlier analysis is conducted to identify data points that significantly differ from other observations, which may potentially cause serious problems in statistical analysis.', 'The relationship analysis includes the use of a correlation matrix to summarize the correlation coefficients between variables, providing a wider perspective on the dataset and serving as an input for more advanced analysis.', 'The creation of a pair plot allows for the visualization of the relationships between two variables, whether they are continuous, categorical, or boolean, providing further insight into the dataset.', 'Scatter plot: Used to show the relationship between math score and reading score, providing insights into the relationship between gender, test preparation course, and lunch with the scores, emphasizing the importance of this visualization technique for data analysis.', 'Histogram: Utilized to display the distribution of math, reading, and writing scores, highlighting the concentration of values within specific ranges and its relevance in analyzing data relationships.', 'Box plot: Employed to examine the distribution of math, reading, and writing scores, emphasizing its role in understanding the relationship between variables in the data set.']}, {'end': 22875.471, 'segs': [{'end': 20869.899, 'src': 'embed', 'start': 20843.162, 'weight': 4, 'content': [{'end': 20848.887, 'text': 'So we have to keep those and then we actually check for unique values of different tables.', 'start': 20843.162, 'duration': 5.725}, {'end': 20856.338, 'text': 'So if we can make out what exactly is necessary for our table or not then came the second part in which we clean the data.', 'start': 20849.468, 'duration': 6.87}, {'end': 20864.357, 'text': 'So we check for the null value since we had no null values inside this data set that we have We had no problems into changing these values.', 'start': 20856.779, 'duration': 7.578}, {'end': 20869.899, 'text': "So if you had any null values, let's say you could have replaced them with drop any,", 'start': 20864.478, 'duration': 5.421}], 'summary': 'The data cleaning process involved checking for unique values and handling null values, which were not found in the dataset.', 'duration': 26.737, 'max_score': 20843.162, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA20843162.jpg'}, {'end': 21174.674, 'src': 'embed', 'start': 21140.583, 'weight': 0, 'content': [{'end': 21142.805, 'text': 'we have the historic record of a company.', 'start': 21140.583, 'duration': 2.222}, {'end': 21147.029, 'text': "how much profit has the company generated over, let's say, the last 10 years?", 'start': 21142.805, 'duration': 4.224}, {'end': 21151.313, 'text': 'and now we want to forecast how much will be the profit in the next couple of quarters.', 'start': 21147.029, 'duration': 4.284}, {'end': 21162.623, 'text': "Or for example, we want to predict how much products do we need to manufacture to meet the demand which is going to be there in let's say next year.", 'start': 21152.274, 'duration': 10.349}, {'end': 21174.674, 'text': "Again, we can understand that by looking at a historical record and getting a better idea because there's a demand and supply gap over there.", 'start': 21164.985, 'duration': 9.689}], 'summary': "Analyzing 10 years of company profit data to forecast next quarter's profit and next year's product demand.", 'duration': 34.091, 'max_score': 21140.583, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA21140583.jpg'}, {'end': 22555.761, 'src': 'embed', 'start': 22528.394, 'weight': 3, 'content': [{'end': 22537.198, 'text': "So if there's date associated with our data set, it makes a lot of sense to make that particular column as our index of our Pandas data frame.", 'start': 22528.394, 'duration': 8.804}, {'end': 22544.295, 'text': "So that is what I'm going to do over here make sure that the date column becomes the index over here.", 'start': 22537.751, 'duration': 6.544}, {'end': 22555.761, 'text': 'We observed that a month column which had dates in it has become our index and passengers contains our values of number of passengers in that particular month.', 'start': 22544.315, 'duration': 11.446}], 'summary': 'Setting date column as index in pandas data frame.', 'duration': 27.367, 'max_score': 22528.394, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA22528394.jpg'}, {'end': 22693.539, 'src': 'embed', 'start': 22669.085, 'weight': 2, 'content': [{'end': 22677.048, 'text': 'What we are going to do is we are going to use Pandas to create a moving average and we are taking a 12 month moving average.', 'start': 22669.085, 'duration': 7.963}, {'end': 22687.933, 'text': 'Remember that the data that we have is sort of monthly data set, and what we are going to do is we are going to say on our index data set,', 'start': 22677.108, 'duration': 10.825}, {'end': 22693.539, 'text': 'which is indexed by month, we want to take a rolling mean of 12 days.', 'start': 22687.933, 'duration': 5.606}], 'summary': 'Using pandas to create a 12-month moving average for monthly dataset.', 'duration': 24.454, 'max_score': 22669.085, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA22669085.jpg'}], 'start': 20843.162, 'title': 'Time series analysis', 'summary': 'Covers data cleaning, relationship analysis, time series analysis in business, time series stationarity, adf test, arima model, and application on airline passenger data with methods like correlation matrix, heat maps, scatter plots, histograms, and box plots, emphasizing the impact on business decisions and operations.', 'chapters': [{'end': 21193.609, 'start': 20843.162, 'title': 'Data cleaning and relationship analysis', 'summary': 'Discusses the process of cleaning data by handling null values, dropping redundant columns, and conducting relationship analysis using correlation matrix, heat maps, scatter plots, histograms, and box plots. it also introduces the concept of time series analysis and its use cases in forecasting and prediction.', 'duration': 350.447, 'highlights': ['The process of cleaning data involves handling null values by replacing them with appropriate values, dropping redundant columns, and analyzing the relationship between different variables. The speaker discusses the steps involved in cleaning data, including handling null values by replacing them with appropriate values, dropping redundant columns, and analyzing the relationship between different variables.', 'Introducing the concept of time series analysis and its use cases in forecasting and prediction for various types of data, such as stock market data, weather data, and business processes. The speaker explains the concept of time series analysis and its use cases in forecasting and prediction for various types of data, such as stock market data, weather data, and business processes.']}, {'end': 21830.203, 'start': 21194.03, 'title': 'Time series analysis in business', 'summary': 'Explains the relevance of time series analysis in understanding trends, seasonality, and stationarity in data, emphasizing the impact on business decisions and operations, with examples including sales trends, seasonal patterns, and the importance of stationarity. it also highlights the association of time series analysis with real-world business problems and the need for data to be stationary for time series models to make accurate predictions.', 'duration': 636.173, 'highlights': ['The relevance of time series analysis in understanding trends, seasonality, and stationarity in data, emphasizing the impact on business decisions and operations. Time series analysis is relevant to day-to-day business problems and helps in understanding trends, seasonality, and stationarity in data, impacting business decisions and operations.', 'The association of time series analysis with real-world business problems and the need for data to be stationary for time series models to make accurate predictions. Time series analysis is associated with real-world business problems, and data needs to be stationary for time series models to make accurate predictions.', 'Examples including sales trends, seasonal patterns, and the importance of stationarity in data for accurate predictions in business operations. Examples of sales trends, seasonal patterns, and the importance of stationarity in data for accurate predictions in business operations are highlighted.']}, {'end': 22164.277, 'start': 21830.924, 'title': 'Time series stationarity', 'summary': 'Explains the concept of time series stationarity, stating that for a time series data to be stationary, the mean, variance, and autocovariance should be more or less constant, and introduces methods like the augmented dickey-fuller test and graph visualization to check for stationarity.', 'duration': 333.353, 'highlights': ['The mean, variance, and autocovariance of a time series data should be more or less constant for it to be considered stationary. It is explained that for a time series data to be stationary, the mean, variance, and autocovariance should be more or less constant.', 'Introduction of the augmented Dickey-Fuller test as a method to test for stationarity, with a significance level of 0.05 for rejecting the null hypothesis of non-stationarity. The augmented Dickey-Fuller test is introduced as a method to test for stationarity, with a significance level of 0.05 for rejecting the null hypothesis of non-stationarity.', 'Recommendation to visually inspect the data set on a chart to determine stationarity. It is suggested to visually inspect the data set on a chart to determine if the values are stationary or not.']}, {'end': 22544.295, 'start': 22164.277, 'title': 'Understanding time series analysis', 'summary': 'Discusses the concepts of adf test, stationary pattern, arima model, ar, ma, arma, and arima models with examples, and the application of time series analysis on airline passenger data to forecast demand and make business decisions.', 'duration': 380.018, 'highlights': ['The chapter discusses the concepts of ADF test, stationary pattern, ARIMA model, AR, MA, ARMA, and ARIMA models. It explains the limitations of ADF test and the importance of visual graphing to identify a stationary pattern in a dataset.', 'The chapter explains the AR, MA, ARMA, and ARIMA models with examples. It details the components of ARIMA model, including autoregressive (AR), moving average (MA), and integrated (I) models, and the concept of differencing to convert non-stationary data to stationary data.', 'The application of time series analysis on airline passenger data to forecast demand and make business decisions is demonstrated. It mentions the availability of a dataset on the number of passengers taking airplanes and the objective of forecasting demand for business decisions.']}, {'end': 22875.471, 'start': 22544.315, 'title': 'Time series analysis', 'summary': 'Discusses exploring a time series dataset, identifying non-stationarity and seasonality, and creating a 12-month moving average to analyze the number of air passengers over time.', 'duration': 331.156, 'highlights': ['The dataset is non-stationary, showing an increasing trend over time. The data set is observed to be non-stationary, with a clear upward trend over time.', 'The dataset exhibits seasonality, likely due to vacation periods such as summer and winter vacations. The dataset shows a strong seasonal pattern, possibly due to increased travel during vacation periods.', 'Creating a 12-month moving average to analyze the air passenger data. Using Pandas, a 12-month moving average is created to analyze the monthly air passenger data.']}], 'duration': 2032.309, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA20843162.jpg', 'highlights': ['Time series analysis is relevant to day-to-day business problems and helps in understanding trends, seasonality, and stationarity in data, impacting business decisions and operations.', 'The process of cleaning data involves handling null values by replacing them with appropriate values, dropping redundant columns, and analyzing the relationship between different variables.', 'The mean, variance, and autocovariance of a time series data should be more or less constant for it to be considered stationary.', 'The augmented Dickey-Fuller test is introduced as a method to test for stationarity, with a significance level of 0.05 for rejecting the null hypothesis of non-stationarity.', 'The chapter explains the AR, MA, ARMA, and ARIMA models with examples.']}, {'end': 25445.079, 'segs': [{'end': 22970.526, 'src': 'embed', 'start': 22944.383, 'weight': 0, 'content': [{'end': 22950.758, 'text': 'And similarly, if we talk about the rolling mean, the 12-day mean, we also observe that it is clearly going up,', 'start': 22944.383, 'duration': 6.375}, {'end': 22954.12, 'text': 'so changing with time that means clearly non-stationary.', 'start': 22950.758, 'duration': 3.362}, {'end': 22963.163, 'text': 'Also observe that both standard deviation and rolling mean graphs are starting at a different point compared to the blue line.', 'start': 22954.6, 'duration': 8.563}, {'end': 22966.164, 'text': "That's because we are taking a 12-day rolling average.", 'start': 22963.303, 'duration': 2.861}, {'end': 22970.526, 'text': "We don't have the value for the first 11 days for the rolling means.", 'start': 22966.604, 'duration': 3.922}], 'summary': 'The 12-day rolling mean is clearly non-stationary and starts at a different point due to the 12-day rolling average.', 'duration': 26.143, 'max_score': 22944.383, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA22944383.jpg'}, {'end': 23100.084, 'src': 'embed', 'start': 23076.149, 'weight': 8, 'content': [{'end': 23083.591, 'text': 'or what this test statistics the first value should be for this value to be a stationary data set.', 'start': 23076.149, 'duration': 7.442}, {'end': 23091.072, 'text': "so at five percent significance level that's the significance level that we are checking the value should be minus two, point eight, eight,", 'start': 23083.591, 'duration': 7.481}, {'end': 23094.973, 'text': 'but it is actually positive zero point eight, one.', 'start': 23091.072, 'duration': 3.901}, {'end': 23097.354, 'text': "so it's quite different from what it should have been.", 'start': 23094.973, 'duration': 2.381}, {'end': 23100.084, 'text': 'So this is what we mean by critical values.', 'start': 23098.143, 'duration': 1.941}], 'summary': "The test statistic should be -2.88 for a stationary dataset at 5% significance level, but it's actually 0.81.", 'duration': 23.935, 'max_score': 23076.149, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA23076149.jpg'}, {'end': 23733.844, 'src': 'embed', 'start': 23643.821, 'weight': 2, 'content': [{'end': 23651.227, 'text': "So that's what we mean by differencing, subtracting the value from the previous value.", 'start': 23643.821, 'duration': 7.406}, {'end': 23659.862, 'text': "Okay So let's go ahead and test the stationarity on this data set as well.", 'start': 23653.189, 'duration': 6.673}, {'end': 23675.313, 'text': 'So again, this, okay, so this has not actually done a complete stationarity for us so far because we see that the p-value is slightly more than 0.05.', 'start': 23661.143, 'duration': 14.17}, {'end': 23678.715, 'text': 'So it has not done it at least in this one particular step.', 'start': 23675.313, 'duration': 3.402}, {'end': 23682.677, 'text': 'Now the good thing with differencing is that we can difference it multiple times.', 'start': 23679.035, 'duration': 3.642}, {'end': 23686.96, 'text': 'We can use differencing several times to make our data set stationarity.', 'start': 23683.098, 'duration': 3.862}, {'end': 23695.585, 'text': 'Right now we did it just once and for now we are going to ignore this part or this slight non stationary component in the data set.', 'start': 23687.44, 'duration': 8.145}, {'end': 23703.438, 'text': 'And finally we are going to use a tool which is available with us in stat model of seasonal decomposition.', 'start': 23698.015, 'duration': 5.423}, {'end': 23705.279, 'text': 'So this is also a very nice tool.', 'start': 23703.839, 'duration': 1.44}, {'end': 23710.142, 'text': 'We are not going to use it for now but just to showcase on how it works.', 'start': 23705.82, 'duration': 4.322}, {'end': 23717.067, 'text': 'So on the top graph we have our original data set.', 'start': 23713.144, 'duration': 3.923}, {'end': 23724.177, 'text': 'And In the graph after that we have the general trend of the data set.', 'start': 23719.248, 'duration': 4.929}, {'end': 23733.844, 'text': 'So the seasonal decomposition separates out the trend from the original data set also separates out the seasonal component from the original data set.', 'start': 23724.217, 'duration': 9.627}], 'summary': 'Differencing can be used multiple times to achieve data stationarity, as shown in a statistical analysis.', 'duration': 90.023, 'max_score': 23643.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA23643821.jpg'}, {'end': 24024.296, 'src': 'embed', 'start': 23997.081, 'weight': 4, 'content': [{'end': 24006.231, 'text': 'Now our model is auto regressive moving average model, quite simply a combination of those, and It is represented in two orders.', 'start': 23997.081, 'duration': 9.15}, {'end': 24008.552, 'text': 'The first value is for autoregressive.', 'start': 24006.732, 'duration': 1.82}, {'end': 24016.394, 'text': 'So this is a one order of autoregressive and a two order of moving average again a one order of moving average model.', 'start': 24008.592, 'duration': 7.802}, {'end': 24024.296, 'text': "So in this we are saying that the relationship is both with yesterday's price and yesterday's noise as well.", 'start': 24016.874, 'duration': 7.422}], 'summary': "Auto regressive moving average model with one order of autoregressive and two orders of moving average, representing the relationship with yesterday's price and noise.", 'duration': 27.215, 'max_score': 23997.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA23997081.jpg'}, {'end': 24349.13, 'src': 'embed', 'start': 24322.047, 'weight': 10, 'content': [{'end': 24327.928, 'text': 'So to get an approximate idea of how good our model is the lower the value the better it is.', 'start': 24322.047, 'duration': 5.881}, {'end': 24334.926, 'text': "So let's say even though we know that it should be a 2 comma 2 order, we are running a 2 comma 0 order over here.", 'start': 24328.644, 'duration': 6.282}, {'end': 24336.566, 'text': "Let's see how it performs.", 'start': 24335.486, 'duration': 1.08}, {'end': 24342.388, 'text': 'So in the red we see our predicted values.', 'start': 24339.087, 'duration': 3.301}, {'end': 24349.13, 'text': 'They seem to be a little different from our actual values and the error that we are getting in a way is 1.5.', 'start': 24343.689, 'duration': 5.441}], 'summary': 'Model error is 1.5, indicating suboptimal performance.', 'duration': 27.083, 'max_score': 24322.047, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA24322047.jpg'}, {'end': 25068.577, 'src': 'embed', 'start': 25034.806, 'weight': 9, 'content': [{'end': 25036.387, 'text': 'How many numerical values are there?', 'start': 25034.806, 'duration': 1.581}, {'end': 25038.728, 'text': 'What kind of data types are there inside your data?', 'start': 25036.427, 'duration': 2.301}, {'end': 25041.265, 'text': 'Is it a CSV file or not? So on.', 'start': 25039.148, 'duration': 2.117}, {'end': 25048.288, 'text': 'so do you have to figure out a lot of things file data exploration and after that you can figure out how to clean your data by cleaning.', 'start': 25041.265, 'duration': 7.023}, {'end': 25052.27, 'text': 'I mean you have to figure out the redundancies that might hinder your model.', 'start': 25048.348, 'duration': 3.922}, {'end': 25054.471, 'text': 'So for that you have to check for null values.', 'start': 25052.33, 'duration': 2.141}, {'end': 25062.854, 'text': 'You have to check for missing values and then you have to figure out what kind of columns will be actually better if you put them inside your model,', 'start': 25054.511, 'duration': 8.343}, {'end': 25068.577, 'text': 'and what are the redundant variables like, what kind of columns that you can actually remove and will not make a difference in your model.', 'start': 25062.854, 'duration': 5.723}], 'summary': 'Perform data exploration to identify numerical values, data types, and potential redundancies before cleaning and selecting columns for model inclusion.', 'duration': 33.771, 'max_score': 25034.806, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA25034806.jpg'}, {'end': 25278.924, 'src': 'embed', 'start': 25253.626, 'weight': 14, 'content': [{'end': 25258.89, 'text': "which is 9, more than 90%, and if you get it the first time, it's very good,", 'start': 25253.626, 'duration': 5.264}, {'end': 25262.372, 'text': "but it's only depends on your data and the kind of model section that you do.", 'start': 25258.89, 'duration': 3.482}, {'end': 25265.161, 'text': "So let's take a look at the next topic in our session guys.", 'start': 25263.021, 'duration': 2.14}, {'end': 25270.642, 'text': "So this is basically where I'm going to perform predictive analysis using Python on a data set.", 'start': 25265.882, 'duration': 4.76}, {'end': 25278.924, 'text': 'So I have a problem statement in which I have a data set which has certain values or certain variables, which has columns like you know,', 'start': 25271.283, 'duration': 7.641}], 'summary': 'Performing predictive analysis using python on a data set with 90% accuracy.', 'duration': 25.298, 'max_score': 25253.626, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA25253626.jpg'}, {'end': 25357.669, 'src': 'embed', 'start': 25329.492, 'weight': 15, 'content': [{'end': 25337.015, 'text': "Like I'm going to use the seaborn to check the relationship between the variables basically for EDA exploratory data analysis.", 'start': 25329.492, 'duration': 7.523}, {'end': 25341.457, 'text': "And if you guys don't know what EDA is, I suggest you to check out another tutorial,", 'start': 25337.476, 'duration': 3.981}, {'end': 25347.92, 'text': "which is exploratory data analysis that we have on our YouTube channel, and then I'm going to import numpy as well, just in case.", 'start': 25341.457, 'duration': 6.463}, {'end': 25357.669, 'text': "All right, and you can see, guys, I have to just press shift and enter, and this is why I'm using a Jupiter notebook,", 'start': 25350.828, 'duration': 6.841}], 'summary': 'Using seaborn for eda, suggests tutorial, importing numpy in jupyter notebook', 'duration': 28.177, 'max_score': 25329.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA25329492.jpg'}], 'start': 22875.471, 'title': 'Time series data analysis', 'summary': 'Covers the concept of stationarity in time series data, methods for making non-stationary data stationary, forecasting airline passengers using arima models, time series analysis basics, and predictive analytics steps, with a suggested model order of 2, 0 for airline passenger forecasting.', 'chapters': [{'end': 23409.908, 'start': 22875.471, 'title': 'Understanding stationarity in time series data', 'summary': 'Discusses the concept of stationarity in time series data, including the visual and statistical methods for checking stationarity, the augmented dickey-fuller test, and techniques for converting non-stationary data to stationary, with a focus on the impact of applying logarithm and rolling mean subtraction methods.', 'duration': 534.437, 'highlights': ['The chapter explains the concept of stationarity in time series data and introduces the visual and statistical methods for checking stationarity, emphasizing the significance of the Augmented Dickey-Fuller test. Explanation of stationarity, visual and statistical methods for checking stationarity, emphasis on the Augmented Dickey-Fuller test', 'The impact of applying the logarithm method to the data set is discussed, highlighting the smoothening effect it has on the data. Discussion of the impact of applying the logarithm method, emphasis on the smoothening effect', "The combination of logarithm and rolling mean subtraction methods is introduced as a technique for converting non-stationary data to stationary, with a focus on assessing the impact on the data set's stationarity. Introduction of the combination technique, focus on assessing the impact on stationarity"]}, {'end': 24245.123, 'start': 23410.869, 'title': 'Making time series data stationary', 'summary': 'Discusses five methods to make non-stationary data stationary, including using rolling average, differencing, and seasonal decomposition. it also explains the concepts of arma modeling, including auto-regression and moving average models, and highlights the process of determining the best order for running arma or arima models.', 'duration': 834.254, 'highlights': ['The chapter discusses five methods to make non-stationary data stationary, including using rolling average, differencing, and seasonal decomposition. It mentions testing stationarity using rolling average and subtracting log values with moving average, and also showcases the seasonal decomposition tool to separate the trend and seasonality from the original data set.', "Explains the concepts of ARMA modeling, including auto-regression and moving average models. It describes the relationship between today's price and previous prices in auto-regressive models, and the relationship between the price today and the noise of previous days in moving average models.", 'Highlights the process of determining the best order for running ARMA or ARIMA models. It mentions using tools to find the best order for auto-regressive and moving average relationships, and identifies the best order as a 2,2 order for the given data set.']}, {'end': 24641.805, 'start': 24245.163, 'title': 'Forecasting airline passengers', 'summary': 'Discusses the process of using arima models to forecast airline passengers, and the iterative approach of refining model orders to minimize error, resulting in a suggested model order of 2, 0. it also covers the challenges of working with transformed data and the need for additional features to improve model accuracy.', 'duration': 396.642, 'highlights': ['Iterative Model Refinement The chapter emphasizes the iterative approach of refining ARIMA model orders to minimize error, resulting in a suggested model order of 2, 0.', 'Challenges with Transformed Data It covers the challenges of working with transformed data, highlighting the need to follow mathematical procedures to convert predictions back to the original form.', 'Importance of Additional Features It discusses the importance of incorporating additional features, such as GDP and wealth data, into the model to improve accuracy.']}, {'end': 24997.864, 'start': 24642.668, 'title': 'Time series analysis and predictive analytics', 'summary': 'Covers the basics of time series analysis, including the use of sarima models for seasonal data sets, and introduces the concept of predictive analytics, providing examples of its applications in campaign management, customer acquisition, budgeting and forecasting, stock prediction, fraud detection, promotions, pricing, and demand planning.', 'duration': 355.196, 'highlights': ['The chapter covers the basics of time series analysis, including the use of Sarima models for seasonal data sets The data set has a strong seasonal component, and Sarima models are recommended for seasonal data sets.', 'Introduces the concept of predictive analytics and provides examples of its applications in various fields Examples of applications include campaign management, customer acquisition, budgeting and forecasting, stock prediction, fraud detection, promotions, pricing, and demand planning.', 'Discussing the applications of predictive analysis in campaign management, customer acquisition, budgeting and forecasting, stock prediction, fraud detection, promotions, pricing, and demand planning Predictive analysis can be used for various applications such as campaign management, customer acquisition, budgeting and forecasting, stock prediction, fraud detection, promotions, pricing, and demand planning.']}, {'end': 25445.079, 'start': 24997.864, 'title': 'Predictive analysis steps', 'summary': 'Discusses the steps involved in predictive analysis, including data exploration, data cleaning, modeling, and performance analysis, emphasizing the importance of each step and the use of linear regression model for beginners.', 'duration': 447.215, 'highlights': ['The chapter discusses the steps involved in predictive analysis, including data exploration, data cleaning, modeling, and performance analysis. It covers the process of data exploration, data cleaning, modeling, and performance analysis in the context of predictive analysis.', "The importance of each step in predictive analysis is emphasized, including the need to understand the data, clean it, select a suitable predictive model, and evaluate the model's performance. It emphasizes the significance of understanding data, cleaning it, selecting a suitable predictive model, and evaluating the model's performance.", 'The use of linear regression model is recommended for beginners due to its simplicity, allowing them to learn it properly. It recommends the use of the linear regression model for beginners as it is simple and facilitates proper learning.', 'The accuracy score of the model is highlighted, suggesting that a score above 70% is good for beginners, while a score around 90% is better for a good model. It highlights the importance of the accuracy score, suggesting that above 70% is good for beginners and around 90% is preferable for a good model.']}], 'duration': 2569.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA22875471.jpg', 'highlights': ['Introduction of the combination technique, focus on assessing the impact on stationarity', 'Discussion of the impact of applying the logarithm method, emphasis on the smoothening effect', 'Explanation of stationarity, visual and statistical methods for checking stationarity, emphasis on the Augmented Dickey-Fuller test', 'The chapter emphasizes the iterative approach of refining ARIMA model orders to minimize error, resulting in a suggested model order of 2, 0', 'The chapter discusses the steps involved in predictive analysis, including data exploration, data cleaning, modeling, and performance analysis', "The importance of each step in predictive analysis is emphasized, including the need to understand the data, clean it, select a suitable predictive model, and evaluate the model's performance", 'The use of linear regression model is recommended for beginners due to its simplicity, allowing them to learn it properly', 'The accuracy score of the model is highlighted, suggesting that a score above 70% is good for beginners, while a score around 90% is better for a good model', 'The chapter discusses five methods to make non-stationary data stationary, including using rolling average, differencing, and seasonal decomposition', 'Explains the concepts of ARMA modeling, including auto-regression and moving average models', 'The process of determining the best order for running ARMA or ARIMA models is highlighted', 'Challenges with Transformed Data It covers the challenges of working with transformed data, highlighting the need to follow mathematical procedures to convert predictions back to the original form', 'Importance of Additional Features It discusses the importance of incorporating additional features, such as GDP and wealth data, into the model to improve accuracy', 'The chapter covers the basics of time series analysis, including the use of Sarima models for seasonal data sets', 'Introduces the concept of predictive analytics and provides examples of its applications in various fields', 'Discussing the applications of predictive analysis in campaign management, customer acquisition, budgeting and forecasting, stock prediction, fraud detection, promotions, pricing, and demand planning', 'The data set has a strong seasonal component, and Sarima models are recommended for seasonal data sets', 'The impact of applying the logarithm method to the data set is discussed, highlighting the smoothening effect it has on the data', "The combination of logarithm and rolling mean subtraction methods is introduced as a technique for converting non-stationary data to stationary, with a focus on assessing the impact on the data set's stationarity", 'The chapter explains the concept of stationarity in time series data and introduces the visual and statistical methods for checking stationarity, emphasizing the significance of the Augmented Dickey-Fuller test']}, {'end': 26797.642, 'segs': [{'end': 25551.858, 'src': 'embed', 'start': 25517.252, 'weight': 3, 'content': [{'end': 25523.354, 'text': "and I'm using this example of house data set because it's very common and to find this data set is very easy.", 'start': 25517.252, 'duration': 6.102}, {'end': 25531.376, 'text': 'You go on to Kaggle and you just look for house prediction data set and it will show you a lot of data sets that you can download there from.', 'start': 25523.374, 'duration': 8.002}, {'end': 25533.457, 'text': 'okay, so you can find the data set on Kaggle, guys.', 'start': 25531.376, 'duration': 2.081}, {'end': 25535.878, 'text': 'Now we have checked the shape as well.', 'start': 25534.337, 'duration': 1.541}, {'end': 25537.158, 'text': "Okay, I'll use one more.", 'start': 25536.118, 'duration': 1.04}, {'end': 25541.109, 'text': 'method that is data or describe right this callable.', 'start': 25537.526, 'duration': 3.583}, {'end': 25551.858, 'text': 'All right, so we have all these numerical values and using the describe method we can get the 50% minimum maximum and the standard deviation.', 'start': 25541.189, 'duration': 10.669}], 'summary': 'Using a common house dataset from kaggle to get numerical values with describe method.', 'duration': 34.606, 'max_score': 25517.252, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA25517252.jpg'}, {'end': 26266.445, 'src': 'embed', 'start': 26234.772, 'weight': 0, 'content': [{'end': 26237.293, 'text': "I'm fitting the training data after that.", 'start': 26234.772, 'duration': 2.521}, {'end': 26239.134, 'text': 'I am using it to predict the value.', 'start': 26237.413, 'duration': 1.721}, {'end': 26242.735, 'text': 'So now comes the part where we have to check the efficiency of the model.', 'start': 26239.914, 'duration': 2.821}, {'end': 26245.776, 'text': 'So for regression models, it is very easy guys.', 'start': 26243.235, 'duration': 2.541}, {'end': 26249.332, 'text': 'You can just check the score.', 'start': 26247.751, 'duration': 1.581}, {'end': 26260.16, 'text': 'For this you have to write a few values X test and Y test and we have the accuracy of 0.70, which is not bad guys.', 'start': 26250.533, 'duration': 9.627}, {'end': 26266.445, 'text': "If you're using the model or the data set this big, it is quite predictable.", 'start': 26260.56, 'duration': 5.885}], 'summary': 'Model accuracy is 0.70 for regression, indicating good predictability with large dataset.', 'duration': 31.673, 'max_score': 26234.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA26234772.jpg'}, {'end': 26413.715, 'src': 'embed', 'start': 26384.822, 'weight': 1, 'content': [{'end': 26388.326, 'text': 'So I start with an equal to, I enter average, open the bracket.', 'start': 26384.822, 'duration': 3.504}, {'end': 26393.21, 'text': "Now I'll have to select the range of cell for which I need the average.", 'start': 26388.926, 'duration': 4.284}, {'end': 26396.614, 'text': 'And in our case, it is the salary column, which is edge to edge.', 'start': 26393.671, 'duration': 2.943}, {'end': 26399.212, 'text': 'I close the bracket, and press enter.', 'start': 26397.412, 'duration': 1.8}, {'end': 26401.473, 'text': 'I get the average salary as 15,538.', 'start': 26399.532, 'duration': 1.941}, {'end': 26405.953, 'text': 'This is the average amount paid to each of the employees in the organization.', 'start': 26401.473, 'duration': 4.48}, {'end': 26409.174, 'text': 'The next function in our list is the median function.', 'start': 26406.494, 'duration': 2.68}, {'end': 26413.715, 'text': 'The median function is again a statistical function which returns the median of a given number.', 'start': 26409.594, 'duration': 4.121}], 'summary': 'The average salary for employees is $15,538, and the next function is the median.', 'duration': 28.893, 'max_score': 26384.822, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA26384822.jpg'}, {'end': 26775.876, 'src': 'embed', 'start': 26728.43, 'weight': 4, 'content': [{'end': 26736.213, 'text': 'If I want to know the first highest value or the first largest value from the list, I can put the number as one.', 'start': 26728.43, 'duration': 7.783}, {'end': 26742.916, 'text': "But if I don't want to know the first largest, I want to know the second largest salary paid to the employee.", 'start': 26736.714, 'duration': 6.202}, {'end': 26745.157, 'text': 'I will press number two.', 'start': 26742.916, 'duration': 2.241}, {'end': 26746.298, 'text': 'close the bracket enter.', 'start': 26745.157, 'duration': 1.141}, {'end': 26751.019, 'text': 'Now, if you see our largest value is actually 49,000.', 'start': 26746.718, 'duration': 4.301}, {'end': 26754.441, 'text': 'However, the answer that you get is 24,500.', 'start': 26751.02, 'duration': 3.421}, {'end': 26758.123, 'text': 'Why? Because we are looking at the second largest value in the range of sales.', 'start': 26754.441, 'duration': 3.682}, {'end': 26762.508, 'text': "If I am looking at fifth largest value, I'll put it as five and press enter.", 'start': 26758.826, 'duration': 3.682}, {'end': 26768.551, 'text': 'It will give me the fifth largest value in the set of sales, which is 22,750.', 'start': 26762.548, 'duration': 6.003}, {'end': 26773.155, 'text': 'Like in our example where I mentioned that Asha Trivedi is now moving to their board of directors.', 'start': 26768.552, 'duration': 4.603}, {'end': 26775.876, 'text': "Let's not remove Sheetal Desai for now.", 'start': 26773.795, 'duration': 2.081}], 'summary': 'Explaining how to find first, second, and fifth largest values in a list of sales data.', 'duration': 47.446, 'max_score': 26728.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA26728430.jpg'}], 'start': 25445.339, 'title': 'Analyzing house dataset, data exploration, visualization, and linear regression model training', 'summary': 'Covers the initial analysis of a house dataset from kaggle, data exploration, and visualization techniques. it also includes the training of a linear regression model to predict housing prices with an achieved accuracy of 0.70. additionally, it covers the use of statistical functions in excel for average, median, mode, standard deviation, and large functions.', 'chapters': [{'end': 25586.262, 'start': 25445.339, 'title': 'Analyzing house dataset on kaggle', 'summary': 'Covers the initial analysis of a house dataset from kaggle, including details on the data structure and key statistics such as the mean, maximum, and minimum values for bedrooms, bathrooms, and square feet.', 'duration': 140.923, 'highlights': ['The data set contains 21,613 entries with 21 columns, making it a substantial and comprehensive data source for analysis.', "The 'describe' method provides essential statistics such as the mean, maximum, and minimum values for bedrooms, bathrooms, and square feet, with the mean bedroom count being 3 and the mean square feet being 2079.", "The most common entry for bedrooms is a three-bedroom house, while for bathrooms, it is a two-bathroom house, providing valuable insights into the dataset's common property features."]}, {'end': 25981.027, 'start': 25586.262, 'title': 'Data exploration and visualization', 'summary': 'Outlines the process of data exploration and visualization, including checking for null values, visualizing relationships between variables, and identifying redundancies in the dataset, emphasizing the importance of bedrooms, bathrooms, square feet living, and waterfront in predicting house prices.', 'duration': 394.765, 'highlights': ['The chapter emphasizes the importance of bedrooms, bathrooms, square feet living, and waterfront in predicting house prices. The speaker emphasizes the importance of bedrooms, bathrooms, square feet living, and waterfront in predicting house prices.', 'The process involves checking for null values and ensuring a clean dataset to avoid hindrances during modeling. The process involves checking for null values in the dataset to ensure a clean dataset for modeling and avoiding hindrances.', 'Visualizing relationships between variables, such as bedrooms, bathrooms, square feet living, floors, and waterfront, is a key part of the data exploration process. Visualizing relationships between variables, including bedrooms, bathrooms, square feet living, floors, and waterfront, is a crucial part of the data exploration process.']}, {'end': 26335.074, 'start': 25982.688, 'title': 'Linear regression model training', 'summary': 'Covers the training of a linear regression model to predict housing prices, including data segregation, model fitting, and performance evaluation, achieving an accuracy of 0.70, and suggestions for improving accuracy by refining input data.', 'duration': 352.386, 'highlights': ["The model achieved an accuracy score of 0.70 when evaluated with the test data. The accuracy of the linear regression model when evaluated with the test data was 0.70, indicating the model's predictive capability.", "Suggestions were provided to improve the model's accuracy by selectively including or excluding certain features such as latitude, longitude, zip code, waterfront, and view, while emphasizing the importance of bedrooms, bathrooms, and square footage. Recommendations were made to enhance the model's accuracy by selectively including or excluding features, such as latitude, longitude, zip code, waterfront, and view, while emphasizing the importance of bedrooms, bathrooms, and square footage.", 'A task was assigned to explore other classifiers and regression models, such as random forest classifier, decision tree, and logistic regression, for predicting housing prices using the same data. The audience was encouraged to explore other classifiers and regression models, including random forest classifier, decision tree, and logistic regression, for predicting housing prices using the same data.']}, {'end': 26797.642, 'start': 26335.094, 'title': 'Excel statistical functions', 'summary': 'Covers the use of statistical functions in excel, including average, median, mode, standard deviation, and large functions, with examples and results for average salary, median value, most frequently occurring salary, standard deviation, and nth largest salary.', 'duration': 462.548, 'highlights': ['The average function calculates the average salary of employees, resulting in an average salary of 15,538. The average function in Excel calculates the average salary of employees, resulting in an average salary of 15,538.', 'The mode function identifies the salary amount occurring most frequently, revealing 17,500 as the most occurring amount. The mode function in Excel identifies the salary amount occurring most frequently, revealing 17,500 as the most occurring amount.', 'The standard deviation function calculates the standard deviation of the sample set of data, yielding a standard deviation value of 4.89 for the scores. The standard deviation function in Excel calculates the standard deviation of the sample set of data, yielding a standard deviation value of 4.89 for the scores.', 'The large function returns the nth largest value from the sample, allowing for the identification of the nth largest salary paid to employees. The large function in Excel returns the nth largest value from the sample, allowing for the identification of the nth largest salary paid to employees.', 'The median function determines the middle value in the salary column, resulting in a median salary value of 15,750. The median function in Excel determines the middle value in the salary column, resulting in a median salary value of 15,750.']}], 'duration': 1352.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA25445339.jpg', 'highlights': ['The model achieved an accuracy score of 0.70 when evaluated with the test data.', "The 'describe' method provides essential statistics such as the mean, maximum, and minimum values for bedrooms, bathrooms, and square feet, with the mean bedroom count being 3 and the mean square feet being 2079.", "The most common entry for bedrooms is a three-bedroom house, while for bathrooms, it is a two-bathroom house, providing valuable insights into the dataset's common property features.", 'Visualizing relationships between variables, including bedrooms, bathrooms, square feet living, floors, and waterfront, is a crucial part of the data exploration process.', 'The average function in Excel calculates the average salary of employees, resulting in an average salary of 15,538.', 'The mode function in Excel identifies the salary amount occurring most frequently, revealing 17,500 as the most occurring amount.', 'The standard deviation function in Excel calculates the standard deviation of the sample set of data, yielding a standard deviation value of 4.89 for the scores.', 'The large function in Excel returns the nth largest value from the sample, allowing for the identification of the nth largest salary paid to employees.', 'The median function in Excel determines the middle value in the salary column, resulting in a median salary value of 15,750.', "Suggestions were provided to improve the model's accuracy by selectively including or excluding certain features such as latitude, longitude, zip code, waterfront, and view, while emphasizing the importance of bedrooms, bathrooms, and square footage.", 'A task was assigned to explore other classifiers and regression models, such as random forest classifier, decision tree, and logistic regression, for predicting housing prices using the same data.']}, {'end': 28081.578, 'segs': [{'end': 26825.175, 'src': 'embed', 'start': 26797.642, 'weight': 0, 'content': [{'end': 26802.063, 'text': 'The second largest salary will be 49,000 instead of 24,500.', 'start': 26797.642, 'duration': 4.421}, {'end': 26807.665, 'text': 'Why? Because now there is even one more larger salary.', 'start': 26802.063, 'duration': 5.602}, {'end': 26811.828, 'text': "The first largest salary has become Asha's salary, which is 60,000.", 'start': 26807.725, 'duration': 4.103}, {'end': 26814.49, 'text': 'earlier it was 49 000 was the largest.', 'start': 26811.829, 'duration': 2.661}, {'end': 26822.234, 'text': "so this way, if you keep on changing anybody else's salary, say i make it to 65 000, just to get a gist of how it works.", 'start': 26814.49, 'duration': 7.744}, {'end': 26825.175, 'text': 'if you will see that earlier it was 49 000 was the largest.', 'start': 26822.234, 'duration': 2.941}], 'summary': "Asha's salary is now the highest at 60,000, increasing from 49,000, making it the second-largest salary, which was initially at 24,500.", 'duration': 27.533, 'max_score': 26797.642, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA26797642.jpg'}, {'end': 27161.21, 'src': 'embed', 'start': 27130.703, 'weight': 2, 'content': [{'end': 27134.206, 'text': 'the glucose level goes to 99, 21.', 'start': 27130.703, 'duration': 3.503}, {'end': 27135.567, 'text': 'the glucose level shows a 65.', 'start': 27134.206, 'duration': 1.361}, {'end': 27141.552, 'text': "but does it make sense by just looking at it, because we can't identify what is the relationship?", 'start': 27135.567, 'duration': 5.985}, {'end': 27143.213, 'text': 'is it minus one, is it one or is it zero?', 'start': 27141.552, 'duration': 1.661}, {'end': 27152.286, 'text': 'So to identify that, I can use a function called correlation function, which is CORAL in Excel, in the cell where I need this result.', 'start': 27143.883, 'duration': 8.403}, {'end': 27161.21, 'text': 'So is equal to CORAL, open the bracket, and then I will select the different arrays for which I need to identify the relation.', 'start': 27152.927, 'duration': 8.283}], 'summary': 'Glucose levels range from 21 to 99, correlation function used for analysis.', 'duration': 30.507, 'max_score': 27130.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA27130703.jpg'}, {'end': 27386.088, 'src': 'embed', 'start': 27359.736, 'weight': 1, 'content': [{'end': 27365.319, 'text': 'Excel has several different types of charts, allowing you to choose one that best fits your data.', 'start': 27359.736, 'duration': 5.583}, {'end': 27370.261, 'text': 'In order to use charts effectively, you will need to understand how different charts are used.', 'start': 27366.019, 'duration': 4.242}, {'end': 27373.062, 'text': 'So we will start with the column charts.', 'start': 27371.081, 'duration': 1.981}, {'end': 27376.864, 'text': "So you'll see all these different types of charts that we are going to learn today.", 'start': 27373.802, 'duration': 3.062}, {'end': 27383.587, 'text': "So to start, we'll go back to our example sheet where I have the sheet for column chart.", 'start': 27377.584, 'duration': 6.003}, {'end': 27386.088, 'text': 'That means there is a data that is already available.', 'start': 27383.627, 'duration': 2.461}], 'summary': 'Excel offers various chart types, starting with column charts, and uses existing data.', 'duration': 26.352, 'max_score': 27359.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA27359736.jpg'}, {'end': 27881.708, 'src': 'embed', 'start': 27854.269, 'weight': 4, 'content': [{'end': 27858.771, 'text': 'which is the colors or the bars which are there that are in the colors.', 'start': 27854.269, 'duration': 4.502}, {'end': 27863.381, 'text': 'that is 2016 and 17 which is the year is showing in two different bars.', 'start': 27859.397, 'duration': 3.984}, {'end': 27866.323, 'text': 'Now, when I click on this switch, row and column,', 'start': 27863.881, 'duration': 2.442}, {'end': 27876.192, 'text': 'what will happen is all the region which is there will become your color of the bar or different bars, while your 2016 and 17 will come on the x-axis.', 'start': 27866.323, 'duration': 9.869}, {'end': 27879.715, 'text': "Let's see how that happens by just clicking on switch rows and columns.", 'start': 27876.572, 'duration': 3.143}, {'end': 27881.708, 'text': 'once i do that and click on.', 'start': 27880.288, 'duration': 1.42}], 'summary': 'Data visualization showing 2016 and 17 in different colored bars with region as x-axis', 'duration': 27.439, 'max_score': 27854.269, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA27854269.jpg'}, {'end': 27926.613, 'src': 'embed', 'start': 27900.454, 'weight': 3, 'content': [{'end': 27906.736, 'text': 'you can do comparison between in 2016, what is the sales that you have done for each region?', 'start': 27900.454, 'duration': 6.282}, {'end': 27909.257, 'text': 'that kind of a comparison can be done to identify.', 'start': 27906.736, 'duration': 2.521}, {'end': 27918.748, 'text': 'if you see, Gujarat has done the lowest sale in 2016,, while again in 2017, Hyderabad has done the lowest sale.', 'start': 27909.257, 'duration': 9.491}, {'end': 27926.613, 'text': "So this kind of comparison can also be done, but depending on what kind of a comparison you're looking at, you can always click on switch,", 'start': 27919.228, 'duration': 7.385}], 'summary': 'Comparison of sales by region in 2016 and 2017 reveals lowest sales in gujarat in 2016 and hyderabad in 2017.', 'duration': 26.159, 'max_score': 27900.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA27900454.jpg'}], 'start': 26797.642, 'title': 'Salary ranking, excel correlation, and chart representation', 'summary': 'Covers the identification of salary rankings with specific values, calculating correlation coefficients in excel, exploring correlations and their implications, and creating recommended and edited charts for data representation.', 'chapters': [{'end': 26888.299, 'start': 26797.642, 'title': 'Salary ranking and identification', 'summary': 'Discusses the process of identifying the largest and smallest salaries in an organization, with the largest salary being 60,000 and the second lowest salary being 5,950, and the process of changing and identifying k values for different salary rankings.', 'duration': 90.657, 'highlights': ['The largest salary is now 60,000, previously 49,000, and the second largest salary is 49,000, previously 24,500. The largest salary has changed to 60,000 from 49,000, and the second largest salary is now 49,000 instead of 24,500.', 'The second lowest salary in the organization is 5,950. The second lowest salary paid to the employees in the organization is 5,950.', 'The second lowest salary paid to the employees in the organization is 7,000. If the K value is set to 2, the second lowest salary paid to the employees in the organization is 7,000.']}, {'end': 27057.014, 'start': 26888.299, 'title': 'Excel correlation coefficient', 'summary': 'Discusses how to use the correlation function in excel to find the correlation coefficient between two variables, explaining the concept of correlation and its implications, with a focus on the range of values and their interpretations.', 'duration': 168.715, 'highlights': ['The correlation function in Excel is used to find the correlation coefficient between two variables, indicating the strength and nature of their relationship, with a value range from -1 to 1.', 'A correlation coefficient close to 1 indicates a strong, positive relationship, where an increase in one variable corresponds to a positive increase in the other, exemplified by shoe sizes and foot length.', 'Conversely, a correlation coefficient close to -1 indicates a strong, negative relationship, where an increase in one variable corresponds to a negative decrease in the other, demonstrated by the amount of gas in the tank and speed of the car.']}, {'end': 27426.017, 'start': 27057.574, 'title': 'Understanding correlation and excel charts', 'summary': 'Explains the concept of correlation, its values, and application in excel, with examples of positive and negative correlations, including their impact on stock market investments, and the introduction to excel charts, specifically column charts.', 'duration': 368.443, 'highlights': ['The chapter explains the concept of correlation, its values, and application in Excel, with examples of positive and negative correlations. The chapter discusses the concept of correlation and its values, including positive and negative correlations, and their impact on relationships between variables.', 'The chapter provides examples of positive and negative correlations, including their impact on stock market investments. The chapter illustrates the impact of positive and negative correlations on stock market investments, with examples of the relationship between the S&P 500 and stock A prices.', 'The chapter introduces Excel charts, specifically column charts, and their significance in graphically representing data. The chapter introduces Excel charts, specifically column charts, and explains their importance in graphically representing data for easy visualization of comparisons and trends.']}, {'end': 27665.277, 'start': 27426.017, 'title': 'Creating recommended charts in excel', 'summary': 'Discusses how to use recommended charts in excel to visually represent data for comparison, specifically focusing on creating column charts and adding data labels to accurately display the information.', 'duration': 239.26, 'highlights': ["The chapter explains the new 'recommended chart' feature introduced in 2016, which automatically suggests suitable chart types for the selected data, enhancing the user experience.", 'It details the process of creating a column chart in Excel to compare data, highlighting its usefulness in visually representing and comparing information, such as sales data for different years like 2016 and 2017.', 'The demonstration of adding and removing data labels to the column chart emphasizes the practical steps for enhancing the visualization by displaying exact numerical values, ensuring clear interpretation of the data.']}, {'end': 28081.578, 'start': 27665.277, 'title': 'Excel chart editing and types', 'summary': 'Explains how to edit chart elements in excel, including removing axes, adding titles, and tweaking grid lines, and discusses the usage and creation of column, line, and pie charts for data representation.', 'duration': 416.301, 'highlights': ['The chapter explains how to add and remove chart elements such as axes and titles in Excel. It provides steps to remove the primary vertical axis and add a chart title, emphasizing the importance of tweaking grid lines for cleaner data representation.', 'The chapter discusses the creation and usage of column, line, and pie charts for data representation. It explains the usage of column charts for data comparison, line charts for showing trends, and pie charts for comparing proportions of different values.', 'It provides guidance on switching between row and column data for chart representation. Explains the process of switching row and column data for chart representation, enabling comparison between different years and regions for sales data.']}], 'duration': 1283.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA26797642.jpg', 'highlights': ['The largest salary is now 60,000, previously 49,000, and the second largest salary is 49,000, previously 24,500.', 'The correlation function in Excel is used to find the correlation coefficient between two variables, indicating the strength and nature of their relationship, with a value range from -1 to 1.', 'The chapter explains the concept of correlation, its values, and application in Excel, with examples of positive and negative correlations.', "The chapter explains the new 'recommended chart' feature introduced in 2016, which automatically suggests suitable chart types for the selected data, enhancing the user experience.", 'The chapter discusses the creation and usage of column, line, and pie charts for data representation.']}, {'end': 30555.803, 'segs': [{'end': 28164.631, 'src': 'embed', 'start': 28138.348, 'weight': 3, 'content': [{'end': 28142.114, 'text': 'In the data labels that we have added, you will see that only the numbers have been added.', 'start': 28138.348, 'duration': 3.766}, {'end': 28148.584, 'text': 'I have to check which color belongs to which in order to see where does the highest number land.', 'start': 28142.134, 'duration': 6.45}, {'end': 28152.61, 'text': 'So if there are bigger lists, there are only five different lists, so that is fine.', 'start': 28148.924, 'duration': 3.686}, {'end': 28158.646, 'text': 'So if there are about 10 to 12 list and you want to see which one belongs to which it is a little difficult to identify that.', 'start': 28152.981, 'duration': 5.665}, {'end': 28164.631, 'text': 'So what I can do is I can add the chart labels on each of them like a data labels.', 'start': 28159.106, 'duration': 5.525}], 'summary': 'Data labels added for 5 lists, but for 10-12 lists, chart labels needed for clear identification.', 'duration': 26.283, 'max_score': 28138.348, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA28138348.jpg'}, {'end': 28284.807, 'src': 'embed', 'start': 28248.057, 'weight': 0, 'content': [{'end': 28252.459, 'text': 'if I insert the normal column chart like this, now your the data is smaller.', 'start': 28248.057, 'duration': 4.402}, {'end': 28254.26, 'text': 'So it is just giving you in that same line.', 'start': 28252.479, 'duration': 1.781}, {'end': 28262.624, 'text': 'However, if you have a bigger data, or even it will look better if you have the bar charts, which is shown in the horizontal level, like this,', 'start': 28254.68, 'duration': 7.944}, {'end': 28270.688, 'text': 'because you can clearly read the headings here on the left hand side and your bar charts goes on the right again, removing the grid lines,', 'start': 28262.624, 'duration': 8.064}, {'end': 28272.269, 'text': 'adding the data labels will remain the same.', 'start': 28270.688, 'duration': 1.581}, {'end': 28275.999, 'text': 'moving on to the next one, that is the surface chart.', 'start': 28272.791, 'duration': 3.208}, {'end': 28279.426, 'text': 'what is the surface chart and how is it useful?', 'start': 28275.999, 'duration': 3.427}, {'end': 28284.807, 'text': 'Surface charts are useful when you want to find the optimum combination between two sets of data.', 'start': 28280.126, 'duration': 4.681}], 'summary': 'Bar charts are better for bigger data, surface charts useful for finding optimum combinations.', 'duration': 36.75, 'max_score': 28248.057, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA28248057.jpg'}, {'end': 28886.666, 'src': 'embed', 'start': 28848.926, 'weight': 2, 'content': [{'end': 28855.191, 'text': 'Once I do that, it will give me a wizard where I have to select whether you have to sort as per smallest to largest or largest to smallest.', 'start': 28848.926, 'duration': 6.265}, {'end': 28857.072, 'text': 'I will do as largest to smallest.', 'start': 28855.491, 'duration': 1.581}, {'end': 28864.776, 'text': "And I can see that in the sales department, I'm giving the maximum salary, which is 3,45,075.", 'start': 28857.832, 'duration': 6.944}, {'end': 28867.059, 'text': 'So this is how your sort works in the pivot table.', 'start': 28864.777, 'duration': 2.282}, {'end': 28873.92, 'text': 'So there are other things that you can also use in the pivot table which are really, really important that we are going to see here.', 'start': 28867.597, 'duration': 6.323}, {'end': 28877.421, 'text': 'There is something called as a slicer in pivot table.', 'start': 28874.52, 'duration': 2.901}, {'end': 28886.666, 'text': 'What is slicer in pivot table and how does that work? When you go to analyze data, there is an option called as insert slicer.', 'start': 28878.762, 'duration': 7.904}], 'summary': "Utilizing pivot table, sorted sales department's maximum salary of 3,45,075, and discussed the functionality of a slicer.", 'duration': 37.74, 'max_score': 28848.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA28848926.jpg'}], 'start': 28081.578, 'title': 'Data analysis and visualization', 'summary': "Covers visualizing revenue using pie charts, bar charts, and surface charts, creating pivot tables in excel, understanding pivot table structure and function, creating pivot charts, and connecting multiple pivot tables to one slicer for simultaneous filtering of data. it also includes using excel's analysis tool pack for financial and statistical data analysis and visualizing covid-19 data in italy over the last 34 days.", 'chapters': [{'end': 28400.577, 'start': 28081.578, 'title': 'Visualizing revenue', 'summary': 'Demonstrates using pie charts, bar charts, and surface charts to analyze revenue data and visualize the distribution of book types and their corresponding revenue. it also explains the process of adding data labels and chart elements to improve the visualization.', 'duration': 318.999, 'highlights': ['Using pie charts to visualize book type distribution and revenue The chapter discusses utilizing pie charts to visualize the distribution of book types, where the mystery genre generates the maximum revenue.', 'Enhancing visualization by adding data labels and chart elements The process of adding data labels and chart elements to pie charts and bar charts is explained to improve visual clarity and comprehension of the revenue data.', 'Utilizing surface charts to analyze the combination of data sets The use of surface charts to analyze the optimum combination between different data sets, such as recruitment in various departments like marketing, finance, and effort, is demonstrated.']}, {'end': 28714.606, 'start': 28401.277, 'title': 'Excel charts and pivot tables', 'summary': 'Covers how to identify data using surface charts and create pivot tables in excel, where pivot tables can be created in less than a minute if the source data is well-organized, and the pivot table wizard allows for easy selection and placement of fields for analysis.', 'duration': 313.329, 'highlights': ['Pivot tables can be created in less than a minute if the source data is well-organized Compared to the time it would take to build an equivalent report manually, pivot tables are incredibly fast.', 'The pivot table wizard allows for easy selection and placement of fields for analysis The pivot table wizard makes it simple to select the table or range, choose the location for the pivot table, and understand the four fields in the pivot table.', 'Identifying data using surface charts Surface charts are used to identify data, with the range of data being 400 to 600.']}, {'end': 29019.134, 'start': 28715.106, 'title': 'Pivot table: structure and function', 'summary': 'Explains the structure and function of a pivot table, including how to organize and filter data, sort values, and utilize slicers for efficient data analysis and presentation.', 'duration': 304.028, 'highlights': ['The pivot table consists of three main areas: rows, columns, and filters, allowing for organization and categorization of data. It demonstrates the organization of data into rows, columns, and filters for efficient categorization and organization.', 'The filter area in the pivot table enables easy application of filters, allowing users to select specific data based on criteria such as region or branch. It illustrates how the filter area facilitates easy application of filters, showcasing the selection of specific data based on region or branch.', 'Sorting in the pivot table allows for the arrangement of data in ascending or descending order, providing insights such as identifying the department with the highest salary. It explains the process of sorting data in ascending or descending order to gain insights, such as identifying the department with the highest salary.', 'Slicers in the pivot table offer a filtering mechanism for efficient data analysis, allowing for the selection of specific fields and multi-select options for enhanced data visualization. It details the functionality of slicers as a filtering mechanism, showcasing the selection of specific fields and multi-select options for enhanced data visualization.']}, {'end': 29309.098, 'start': 29019.955, 'title': 'Creating pivot charts from pivot tables', 'summary': 'Explains how to create pivot charts from pivot tables, add and customize data labels, switch between row and column data, and link multiple pivot tables using slicers, improving data visualization and analysis for day-to-day reporting.', 'duration': 289.143, 'highlights': ['Pivot charts automatically update with new data and allow selective data display Pivot charts are linked to pivot tables and automatically update with new data, allowing users to request specific information for display rather than showing all data and then selecting parts to display.', 'Linking slicers to multiple pivot tables for synchronized data analysis Slicers can be linked to multiple pivot tables, allowing synchronized data analysis and enabling users to select and analyze data for specific criteria across multiple pivot tables.', 'Customizing pivot charts similar to normal charts Pivot charts can be customized with chart titles, data labels, and switching between row and column data, providing similar functionalities to normal charts for enhanced data visualization.']}, {'end': 29943.304, 'start': 29309.098, 'title': 'Pivot table slicer and excel data analysis', 'summary': 'Explains how to connect multiple pivot tables to one slicer, enabling the simultaneous filtering of data, and demonstrates the usage of the analysis tool pack in excel for financial and statistical data analysis, including loading the tool pack and performing a regression analysis, along with providing insights into the data science life cycle and project suggestions for beginners.', 'duration': 634.206, 'highlights': ['Connecting Multiple Pivot Tables to One Slicer Demonstrates the method to connect two or three pivot tables to one slicer, allowing for simultaneous filtering of data with a single slicer.', 'Usage of Analysis Tool Pack in Excel Explains the process of loading the analysis tool pack in Excel, which provides data analysis tools for financial, statistical, and engineering data analysis, and includes the steps to access and utilize the tool pack for performing data analysis.', 'Performing Regression Analysis in Excel Illustrates the procedure for performing regression analysis in Excel using the data analysis tool pack, including selecting input ranges, interpreting the summary output, and understanding basic regression statistics and analysis of variance.', 'Insights into Data Science Life Cycle and Beginner Project Suggestions Provides insights into the data science life cycle, covering processes like data collection, cleaning, visualization, model selection, and deployment, and offers project suggestions for beginners, emphasizing the importance of mastering the basics and starting with projects covering various aspects of the data science life cycle.']}, {'end': 30555.803, 'start': 29943.965, 'title': 'Data analysis and visualization', 'summary': 'Covers importing necessary libraries, reading csv data, exploring data using describe method, checking for null values, and visualizing relationships between variables using scatter plots to understand the covid-19 data in italy over the last 34 days, revealing insights such as the number of recoveries, deaths, confirmed cases, and hospitalized individuals.', 'duration': 611.838, 'highlights': ['The scatter plot visualization shows a linear relationship between the total number of cases and the number of people who have recovered, indicating that as the total cases increased, the number of recoveries also increased, with more than 12,000 recoveries for over 100,000 cases. Linear relationship between total cases and number of recoveries, with over 12,000 recoveries for over 100,000 cases.', 'The visualization also reveals a concerning trend where the number of deaths followed a similar pattern to the number of recoveries, with more than 10,000 deaths for the same period, and a significant rise in deaths as the total cases increased. More than 10,000 deaths for the same period, with a significant rise in deaths as the total cases increased.', 'The analysis indicates an exponential increase in the total number of confirmed cases, reaching more than 70,000 out of 100,000 cases over the last 34 days in Italy. Exponential increase in the total number of confirmed cases, reaching more than 70,000 out of 100,000 cases over the last 34 days in Italy.', 'A slight curve is observed in the relationship between the number of people hospitalized with symptoms and the total number of confirmed cases, indicating that while more than 25,000 individuals were hospitalized with symptoms, the total cases exceeded 70,000, suggesting that having symptoms does not guarantee immunity from the virus. More than 25,000 individuals hospitalized with symptoms, but the total cases exceeded 70,000, suggesting that having symptoms does not guarantee immunity from the virus.', 'The chapter also covers importing necessary libraries, reading CSV data, and exploring data using the describe method to understand the COVID-19 data in Italy over the last 34 days. Importing necessary libraries, reading CSV data, and exploring data using the describe method to understand the COVID-19 data in Italy over the last 34 days.']}], 'duration': 2474.225, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA28081578.jpg', 'highlights': ['Utilizing surface charts to analyze the combination of data sets', 'Enhancing visualization by adding data labels and chart elements', 'Using pie charts to visualize book type distribution and revenue', 'Pivot tables can be created in less than a minute if the source data is well-organized', 'Linking slicers to multiple pivot tables for synchronized data analysis', 'Connecting Multiple Pivot Tables to One Slicer', 'The pivot table consists of three main areas: rows, columns, and filters, allowing for organization and categorization of data', 'Pivot charts automatically update with new data and allow selective data display', 'The scatter plot visualization shows a linear relationship between the total number of cases and the number of people who have recovered, indicating that as the total cases increased, the number of recoveries also increased, with more than 12,000 recoveries for over 100,000 cases', 'The analysis indicates an exponential increase in the total number of confirmed cases, reaching more than 70,000 out of 100,000 cases over the last 34 days in Italy']}, {'end': 33259, 'segs': [{'end': 30671.973, 'src': 'embed', 'start': 30642.576, 'weight': 6, 'content': [{'end': 30645.177, 'text': "So what I'm going to do is with some data sets.", 'start': 30642.576, 'duration': 2.601}, {'end': 30652.4, 'text': 'You may want to understand changes in one variable as a function or a similarity continuous variable.', 'start': 30645.657, 'duration': 6.743}, {'end': 30655.821, 'text': 'in this situation, a good choice to draw a line plot.', 'start': 30652.4, 'duration': 3.421}, {'end': 30666.75, 'text': 'So in seaborn we can accomplish this by line plot function or we can just use real plot and in the type or the kind we can write line, right guys.', 'start': 30656.365, 'duration': 10.385}, {'end': 30671.973, 'text': "So we'll wait for the output and then we'll take a look at the next plot that we are going to work on.", 'start': 30667.471, 'duration': 4.502}], 'summary': 'Using seaborn to create line plots for data analysis.', 'duration': 29.397, 'max_score': 30642.576, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA30642576.jpg'}, {'end': 30845.744, 'src': 'embed', 'start': 30822.143, 'weight': 9, 'content': [{'end': 30828.809, 'text': 'So this is the bright side, guys, and similarly we can also check for other relationship between, you know, intensive care,', 'start': 30822.143, 'duration': 6.666}, {'end': 30835.194, 'text': 'like people with intensive care, and all these plot points that you have inside this data set after this.', 'start': 30828.809, 'duration': 6.385}, {'end': 30841.239, 'text': "Let's take a look at how we can actually plot a few graphs using the categorical scatter plots.", 'start': 30835.594, 'duration': 5.645}, {'end': 30845.744, 'text': 'So the default representation of the data in cat plot uses a scatter plot.', 'start': 30841.963, 'duration': 3.781}], 'summary': 'Analyzing data using categorical scatter plots in data set.', 'duration': 23.601, 'max_score': 30822.143, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA30822143.jpg'}, {'end': 30877.677, 'src': 'embed', 'start': 30856.568, 'weight': 8, 'content': [{'end': 30870.153, 'text': 'So they take different approaches to resolving the main challenge in representing categorical data with a scatter plot that all of the points belonging to one category would fall on the same position along the axis corresponding to the categorical variable.', 'start': 30856.568, 'duration': 13.585}, {'end': 30877.677, 'text': 'So the approach used by the cat plot is to adjust the positions of points on the categorical axis with a small amount of random data.', 'start': 30870.774, 'duration': 6.903}], 'summary': 'Cat plot adjusts positions of points on categorical axis with random data', 'duration': 21.109, 'max_score': 30856.568, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA30856568.jpg'}, {'end': 31007.586, 'src': 'embed', 'start': 30979.483, 'weight': 2, 'content': [{'end': 30984.187, 'text': 'Everything is going up you compare anything the deaths are going up the recovered patients are also going up.', 'start': 30979.483, 'duration': 4.704}, {'end': 30986.369, 'text': "So that's the brighter side though.", 'start': 30984.727, 'duration': 1.642}, {'end': 30993.635, 'text': "So that is the conclusion that we can draw from this project that we have done over here with all these graphs that I've flawed.", 'start': 30986.389, 'duration': 7.246}, {'end': 30995.597, 'text': 'So we had our data guys.', 'start': 30994.356, 'duration': 1.241}, {'end': 31003.163, 'text': 'In which we had all these values for hospitalized with symptoms, people in the ICU.', 'start': 30997.558, 'duration': 5.605}, {'end': 31007.586, 'text': 'the total hospitalized people were there and the home quarantine people were there.', 'start': 31003.163, 'duration': 4.423}], 'summary': 'Deaths and recoveries are increasing, indicating a positive trend in the data.', 'duration': 28.103, 'max_score': 30979.483, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA30979483.jpg'}, {'end': 31738.791, 'src': 'embed', 'start': 31710.316, 'weight': 5, 'content': [{'end': 31715.662, 'text': "So in order to do that, let me show you what I'm trying to say if I type here data dot info.", 'start': 31710.316, 'duration': 5.346}, {'end': 31718.664, 'text': "You'll see that, you know, this method still exists.", 'start': 31716.443, 'duration': 2.221}, {'end': 31723.466, 'text': 'The reason is because I have not updated our data set and there are two ways you can do it either.', 'start': 31719.064, 'duration': 4.402}, {'end': 31729.868, 'text': 'You can assign this data over here and create a new variable data or the other way you can do is use in place.', 'start': 31723.526, 'duration': 6.342}, {'end': 31733.75, 'text': 'So if I put here comma in place is equal to true.', 'start': 31730.388, 'duration': 3.362}, {'end': 31738.791, 'text': 'This would say was the time of assigning new variable, and I just hit enter.', 'start': 31734.33, 'duration': 4.461}], 'summary': 'Demonstrates data manipulation using in-place method for updating datasets.', 'duration': 28.475, 'max_score': 31710.316, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA31710316.jpg'}, {'end': 31892.148, 'src': 'embed', 'start': 31853.166, 'weight': 1, 'content': [{'end': 31855.047, 'text': 'So count win is nothing but a dictionary.', 'start': 31853.166, 'duration': 1.881}, {'end': 31859.063, 'text': 'and key over here is nothing but Mumbai Indians Chennai Super Kings and all that.', 'start': 31855.882, 'duration': 3.181}, {'end': 31863.043, 'text': 'So let me quickly give that up here dot keys.', 'start': 31859.603, 'duration': 3.44}, {'end': 31865.044, 'text': 'Let me hit enter as of now.', 'start': 31863.383, 'duration': 1.661}, {'end': 31870.945, 'text': 'Let me comment this and let me see what our labels contain as you can see here X is not found.', 'start': 31865.064, 'duration': 5.881}, {'end': 31876.386, 'text': "The reason why I'm getting that is because this has to be an uppercase and let me execute this once again.", 'start': 31871.325, 'duration': 5.061}, {'end': 31883.067, 'text': 'So as you can see here, we are getting the list of all the teams that are one and now we are going to plot this.', 'start': 31876.886, 'duration': 6.181}, {'end': 31889.501, 'text': "So to plot what I'm going to do here is I'll take PLT dot subplots.", 'start': 31884.366, 'duration': 5.135}, {'end': 31892.148, 'text': "Here I'm going to give the figure size.", 'start': 31890.824, 'duration': 1.324}], 'summary': 'Processing a dictionary of teams and plotting the data.', 'duration': 38.982, 'max_score': 31853.166, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA31853166.jpg'}, {'end': 33226.002, 'src': 'embed', 'start': 33191.055, 'weight': 0, 'content': [{'end': 33193.958, 'text': "I'll be just changing my consumer key and the secret key.", 'start': 33191.055, 'duration': 2.903}, {'end': 33196.881, 'text': "Let's regenerate consumer key and secret key.", 'start': 33194.718, 'duration': 2.163}, {'end': 33203.413, 'text': "Okay It's done and my access token key and my access token secret.", 'start': 33198.142, 'duration': 5.271}, {'end': 33205.355, 'text': 'Let me regenerate them too.', 'start': 33203.974, 'duration': 1.381}, {'end': 33212.797, 'text': 'So you need mainly four variable in order to prove your authentication in the consumer key, which is our API key, the consumer secret,', 'start': 33206.375, 'duration': 6.422}, {'end': 33216.439, 'text': 'which is a API secret, the access token and the access token secret.', 'start': 33212.797, 'duration': 3.642}, {'end': 33217.718, 'text': 'All right, fine.', 'start': 33216.718, 'duration': 1}, {'end': 33226.002, 'text': "So let's start by importing creepy import creepy next as I said, I'll be using text blob.", 'start': 33218.279, 'duration': 7.723}], 'summary': 'Regenerating consumer key, secret key, access token key, and access token secret for authentication.', 'duration': 34.947, 'max_score': 33191.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA33191055.jpg'}], 'start': 30556.244, 'title': 'Visualizing data relationships and techniques', 'summary': 'Covers visualizing data relationships, including insights about the relationship between home quarantine and total cases, using categorical scatter plots to visualize covid-19 data, the data science life cycle, and exploratory data analysis of ipl teams, providing practical applications and demonstrating the importance of data visualization in decision-making.', 'chapters': [{'end': 30821.803, 'start': 30556.244, 'title': 'Visualizing data relationships', 'summary': 'Covers the process of visualizing data relationships using different plot types and highlights the importance of adapting visual representations for specific datasets and questions, alongside insights about the relationship between home quarantine and total cases as well as the impact of home quarantine on flattening the curve and reducing total cases.', 'duration': 265.559, 'highlights': ['The relationship between home quarantine and total cases is highlighted, with more than 40,000 people in home quarantine leading to around 100,000 total cases. The total number of cases is probably around 100,000 and the home quarantine people are more than 40,000 people.', 'Insight into the impact of home quarantine on flattening the curve and reducing total cases is provided, indicating that more people in quarantine can lead to a significant decrease in total cases. If more people are in quarantine, the total number of cases is going to be very low. When more than 10,000 people were in home quarantine, more than 4,000 people had recovered.', 'The importance of adapting visual representations for specific datasets and questions is emphasized, with a reminder that there is no universally optimal type of visualization. Scatter plots are highly effective, but there is no universally optimal type of visualization. The visual representation should be adapted for the specific of the data set and to the question you are trying to answer with the plot.']}, {'end': 31191.696, 'start': 30822.143, 'title': 'Categorical scatter plots for covid-19 data', 'summary': 'Discusses using categorical scatter plots to visualize covid-19 data, highlighting the increasing recovery rates, the need to flatten the curve of total cases and hospitalizations, and the concern over rising death rates.', 'duration': 369.553, 'highlights': ['The number of people recovering from COVID-19 is increasing rapidly, indicating a positive trend in the battle against the virus.', 'We need to focus on flattening the curve of total confirmed cases and hospitalized individuals to prevent overwhelming healthcare systems.', 'The rising number of deaths due to COVID-19 is a cause for concern, and efforts should be made to mitigate the impact and prevent further escalation.']}, {'end': 32102.525, 'start': 31191.696, 'title': 'Data science life cycle & data visualization', 'summary': 'Covers the data science life cycle, including gathering and loading data into a program, visualizing data using seaborn library, benefits of data visualization in decision-making, and tools for data visualization. it also demonstrates the practical application of data visualization using ipl dataset, including the use of pie charts and count plots to visualize the most wins in ipl matches and eliminator rounds.', 'duration': 910.829, 'highlights': ['The chapter covers the data science life cycle, including gathering and loading data into a program. It explains the process of gathering data from sources like Kaggle, importing data into a program using pandas, and handling errors such as truncated errors while loading data.', 'Demonstrates the practical application of data visualization using IPL dataset, including the use of pie charts and count plots to visualize the most wins in IPL matches and eliminator rounds. It showcases the practical use of data visualization techniques such as pie charts and count plots to visualize the most wins in IPL matches and eliminator rounds, providing insights into the analysis of sports data.', 'Benefits of data visualization in decision-making and the tools for data visualization. It highlights the benefits of data visualization in decision-making, such as identifying correlations, examining trends over time, analyzing frequency, exploring markets, and understanding risk and reward. It also introduces various tools for data visualization, including both dedicated software and libraries integrated with programming languages like Python.']}, {'end': 32644.984, 'start': 32102.525, 'title': 'Exploratory data analysis of ipl teams', 'summary': 'Explores the importance of domain expertise in performing exploratory data analysis (eda) and demonstrates the process of capturing, analyzing, and visualizing the toss decision data of ipl teams, highlighting the number of times each team has won the toss and their decision to bat or field.', 'duration': 542.459, 'highlights': ['Explaining the importance of domain expertise in EDA Emphasizes the significance of having domain expertise when performing EDA, highlighting the potential mistakes due to lack of knowledge and the importance of understanding the internal workings of the domain.', 'Capturing and analyzing toss decision data Demonstrates the process of capturing toss winner data, creating a data frame, and analyzing the number of times each team won the toss and their decision to bat or field, with examples of specific teams and their corresponding decisions.', "Visualizing toss decision data Illustrates the visualization of the toss decision data using Seaborn's cat plot, showcasing the number of times each team won the toss and their decision to bat or field in a graphical form, with appropriate labeling and rotation."]}, {'end': 33259, 'start': 32646.265, 'title': 'Cricket data visualization', 'summary': 'Explores data visualization of cricket statistics, including insights on team performance, famous venues, and umpire statistics, using bar graphs and plots, providing quantifiable data and visual representation, and highlights the importance of data visualization in creating relationships and models.', 'duration': 612.735, 'highlights': ['The chapter explores data visualization of cricket statistics, including insights on team performance, famous venues, and umpire statistics. The chapter covers the visualization of cricket statistics, such as team performance, famous venues, and umpire statistics, using bar graphs and plots.', 'Provides quantifiable data and visual representation of team performance, famous venues, and umpire statistics. The visualization provides quantifiable data on team performance, famous venues, and umpire statistics through bar graphs and plots, showcasing the number of matches, choices made, and frequency of occurrences.', 'Highlights the importance of data visualization in creating relationships and models. Emphasizes the significance of data visualization in creating relationships and models, particularly in understanding dependencies and sentiments, as well as the process of authenticating and using Tweepy for sentiment analysis of Twitter data.']}], 'duration': 2702.756, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA30556244.jpg', 'highlights': ['More than 40,000 people in home quarantine leading to around 100,000 total cases.', 'Insight into the impact of home quarantine on flattening the curve and reducing total cases is provided.', 'Scatter plots are highly effective, but there is no universally optimal type of visualization.', 'The number of people recovering from COVID-19 is increasing rapidly.', 'We need to focus on flattening the curve of total confirmed cases and hospitalized individuals.', 'The rising number of deaths due to COVID-19 is a cause for concern.', 'The chapter covers the data science life cycle, including gathering and loading data into a program.', 'Demonstrates the practical application of data visualization using IPL dataset.', 'Benefits of data visualization in decision-making and the tools for data visualization.', 'Emphasizes the significance of having domain expertise when performing EDA.', 'Demonstrates the process of capturing toss winner data, creating a data frame, and analyzing the number of times each team won the toss and their decision to bat or field.', "Illustrates the visualization of the toss decision data using Seaborn's cat plot.", 'The chapter explores data visualization of cricket statistics, including insights on team performance, famous venues, and umpire statistics.', 'Provides quantifiable data and visual representation of team performance, famous venues, and umpire statistics.', 'Emphasizes the significance of data visualization in creating relationships and models.']}, {'end': 34731.912, 'segs': [{'end': 33765.727, 'src': 'embed', 'start': 33742.42, 'weight': 7, 'content': [{'end': 33749.603, 'text': "So now is the time we try and find answer to the question we put forth in the problem state that is who will be the world's best playing 11.", 'start': 33742.42, 'duration': 7.183}, {'end': 33750.984, 'text': 'Let us begin our analysis.', 'start': 33749.603, 'duration': 1.381}, {'end': 33757.742, 'text': 'So, as discussed earlier as well, I will be using a 4-3-3 lineup, which means we need to find 4 best defenders,', 'start': 33752.059, 'duration': 5.683}, {'end': 33762.665, 'text': '3 best midfielders and 3 best attackers, and we will be having 1 goalkeeper as well.', 'start': 33757.742, 'duration': 4.923}, {'end': 33765.727, 'text': 'So let us start our quest by finding a goalkeeper first.', 'start': 33763.025, 'duration': 2.702}], 'summary': "Finding the world's best playing 11 based on a 4-3-3 lineup involving 4 defenders, 3 midfielders, 3 attackers, and 1 goalkeeper.", 'duration': 23.307, 'max_score': 33742.42, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA33742420.jpg'}, {'end': 33901.805, 'src': 'embed', 'start': 33869.957, 'weight': 0, 'content': [{'end': 33872.539, 'text': 'So what I will be doing, I will be first plotting my shot stopper.', 'start': 33869.957, 'duration': 2.582}, {'end': 33874.672, 'text': 'Now over here what I have done?', 'start': 33873.012, 'duration': 1.66}, {'end': 33881.994, 'text': 'I have simply plotted a bar plot wherein I will be passing the variable that is gkShortStopper and then using the function sortValues.', 'start': 33874.672, 'duration': 7.322}, {'end': 33886.155, 'text': 'we have arranged them in the descending order and then I just want the top 5 of them.', 'start': 33881.994, 'duration': 4.161}, {'end': 33891.537, 'text': 'Then on my x-axis I have the name of the players and on the y-axis I have the shortstopper score.', 'start': 33886.516, 'duration': 5.021}, {'end': 33895.278, 'text': 'So based on these parameters you will get a graph somewhat like this.', 'start': 33892.577, 'duration': 2.701}, {'end': 33901.805, 'text': 'It is quite evident from this graph that Manuel Neuer is the best goalkeeper as you can see he tops this chart.', 'start': 33896.42, 'duration': 5.385}], 'summary': 'Plotted bar graph of top 5 shot stoppers, manuel neuer tops the chart.', 'duration': 31.848, 'max_score': 33869.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA33869957.jpg'}, {'end': 34497.824, 'src': 'embed', 'start': 34466.161, 'weight': 1, 'content': [{'end': 34467.422, 'text': 'Next we need to plot the beast.', 'start': 34466.161, 'duration': 1.261}, {'end': 34472.205, 'text': 'So over here I have given the position as RCM or RM and then we have just plotted it.', 'start': 34467.722, 'duration': 4.483}, {'end': 34473.727, 'text': 'So let me just see the results.', 'start': 34472.486, 'duration': 1.241}, {'end': 34480.211, 'text': 'So over here as per the analysis I am getting Kante as the best beast or you can say the right central midfielder.', 'start': 34474.007, 'duration': 6.204}, {'end': 34484.014, 'text': 'So here we have completed our hunt of finding the best 3 midfielders.', 'start': 34480.672, 'duration': 3.342}, {'end': 34487.257, 'text': 'So let me just go back to the presentation and let me show you them as well.', 'start': 34484.374, 'duration': 2.883}, {'end': 34491.76, 'text': 'So here I have the controller or you can say the left midfielder as Iniesta.', 'start': 34487.617, 'duration': 4.143}, {'end': 34497.824, 'text': 'Then I have the playmaker as Ozil and we have the beast or you can say the right central midfielder as Kante.', 'start': 34492.162, 'duration': 5.662}], 'summary': 'Identified kante as the best right central midfielder in the analysis.', 'duration': 31.663, 'max_score': 34466.161, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA34466161.jpg'}], 'start': 33259.634, 'title': 'Fifa world cup analysis', 'summary': 'Covers python twitter sentiment analysis and fifa world cup data analysis, including the process of selecting the best goalkeeper and the overall best 11 for the fifa world cup, with specific weighted characteristics and attributes used for selection.', 'chapters': [{'end': 33561.057, 'start': 33259.634, 'title': 'Python twitter sentiment analysis', 'summary': 'Covers how to establish a connection between python and twitter, search for tweets regarding donald trump, fetch live tweets from twitter, and analyze sentiment scores, and also introduces a fifa world cup analysis using python, including steps for collecting and analyzing the dataset, and importing required libraries.', 'duration': 301.423, 'highlights': ['The chapter covers how to establish a connection between Python and Twitter, search for tweets regarding Donald Trump, fetch live tweets from Twitter, and analyze sentiment scores. The process involves establishing a connection between Python and Twitter, searching for tweets related to Donald Trump, fetching live tweets from Twitter, and analyzing sentiment scores based on the polarity and subjectivity of the data.', 'Introduces a FIFA World Cup analysis using Python, including steps for collecting and analyzing the dataset, and importing required libraries. The chapter introduces a FIFA World Cup analysis using Python, which includes steps for collecting, analyzing the dataset, and importing required libraries such as pandas, NumPy, Matplotlib, and Seaborn for data analysis, manipulation, and visualization.']}, {'end': 33941.901, 'start': 33561.577, 'title': 'Data analysis and visualization for world cup team selection', 'summary': 'Covers data cleaning, visualization using matplotlib and seaborn libraries, and the process of selecting the best goalkeeper for the world cup team based on specific weighted characteristics, with manuel neuer identified as the best choice.', 'duration': 380.324, 'highlights': ['The process of selecting the best goalkeeper for the World Cup team involved analyzing specific weighted characteristics such as shot stopper and sweeper, with Manuel Neuer identified as the best choice. The chapter covers the process of selecting the best goalkeeper for the World Cup team based on specific weighted characteristics such as shot stopper and sweeper, with Manuel Neuer identified as the best choice.', 'The chapter emphasizes the importance of data cleaning and visualization using Matplotlib and Seaborn libraries to gain statistical insights and analyze the dataset. The chapter emphasizes the importance of data cleaning and visualization using Matplotlib and Seaborn libraries to gain statistical insights and analyze the dataset.', "The process of data cleaning involves removing redundant and unwanted columns, with the demonstration of deleting the 'national kit' column and resulting in 5 rows and 52 columns. The process of data cleaning involves removing redundant and unwanted columns, with the demonstration of deleting the 'national kit' column and resulting in 5 rows and 52 columns."]}, {'end': 34412.438, 'start': 33942.282, 'title': 'Analyzing fifa world cup data', 'summary': 'Details the process of analyzing fifa world cup data to identify the best goalkeeper, defenders, and midfielders for a 4-3-3 formation using specific attributes and weightage, ultimately concluding with the selection of the best players for each position.', 'duration': 470.156, 'highlights': ['Analyzing goalkeeper performance to select the best for FIFA World Cup based on shot stopping and sweeping scores, determining Manuel Neuer as the top choice. By analyzing shot stopping and sweeping scores, Manuel Neuer is identified as the best goalkeeper for the FIFA World Cup.', 'Utilizing specific attributes and positions to predict and select the four best defenders for a 4-3-3 formation. Using attributes and positions such as LCB, RCB, LWB, and RWB to predict and select the four best defenders for a 4-3-3 formation.', 'Categorizing midfielders as controller, playmaker, and beast, assigning weightage to characteristics to identify the best players for each category. Categorizing midfielders as controller, playmaker, and beast, assigning weightage to characteristics to identify the best players for each category.']}, {'end': 34731.912, 'start': 34412.458, 'title': 'Best 11 for fifa world cup', 'summary': "Discusses the process of selecting the best 11 for the fifa world cup, including the best goalkeeper, 4 best defenders, 3 best midfielders, and 3 best attackers, revealing the selected players for each position and emphasizing that the selection is based on the speaker's opinion.", 'duration': 319.454, 'highlights': ['Manuel Neuer chosen as the best goalkeeper, followed by the selection of 4 best defenders including Alexandro, Ramos, Cesar, and Kyle Walker. The best goalkeeper, Manuel Neuer, and the 4 best defenders, including Alexandro, Ramos, Cesar, and Kyle Walker, were selected for the FIFA World Cup.', 'Iniesta, Ozil, and Kante identified as the best 3 midfielders based on analysis and plotting of their respective positions as controller, playmaker, and beast. The best 3 midfielders, Iniesta, Ozil, and Kante, were identified based on analysis and plotting of their respective positions as controller, playmaker, and beast.', 'Ronaldo, Messi, and Lewandowski chosen as the best 3 attackers, with each being the best in their respective positions of left wing attacker, right wing attacker, and striker. The best 3 attackers, Ronaldo, Messi, and Lewandowski, were chosen based on their respective positions of left wing attacker, right wing attacker, and striker.']}], 'duration': 1472.278, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA33259634.jpg', 'highlights': ['The best goalkeeper, Manuel Neuer, and the 4 best defenders, including Alexandro, Ramos, Cesar, and Kyle Walker, were selected for the FIFA World Cup.', 'The best 3 midfielders, Iniesta, Ozil, and Kante, were identified based on analysis and plotting of their respective positions as controller, playmaker, and beast.', 'The best 3 attackers, Ronaldo, Messi, and Lewandowski, were chosen based on their respective positions of left wing attacker, right wing attacker, and striker.', 'The chapter covers the process of selecting the best goalkeeper for the World Cup team based on specific weighted characteristics such as shot stopper and sweeper, with Manuel Neuer identified as the best choice.', 'By analyzing shot stopping and sweeping scores, Manuel Neuer is identified as the best goalkeeper for the FIFA World Cup.', 'Using attributes and positions such as LCB, RCB, LWB, and RWB to predict and select the four best defenders for a 4-3-3 formation.', 'Categorizing midfielders as controller, playmaker, and beast, assigning weightage to characteristics to identify the best players for each category.', 'The chapter emphasizes the importance of data cleaning and visualization using Matplotlib and Seaborn libraries to gain statistical insights and analyze the dataset.']}, {'end': 36203.952, 'segs': [{'end': 34797.473, 'src': 'embed', 'start': 34772.43, 'weight': 0, 'content': [{'end': 34781.276, 'text': 'Experts and the World Bank point to trade wars and other globally financial risky situations as reasons for a dim forecast.', 'start': 34772.43, 'duration': 8.846}, {'end': 34789.783, 'text': 'In economics a recession is a business cycle contraction where there is a general decline in economic activity.', 'start': 34781.917, 'duration': 7.866}, {'end': 34797.473, 'text': 'Recessions generally occur when there is a widespread drop in spending which leads to an adverse demand shock.', 'start': 34790.484, 'duration': 6.989}], 'summary': 'Trade wars and global financial risks lead to economic decline and recessions.', 'duration': 25.043, 'max_score': 34772.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA34772430.jpg'}, {'end': 36138.654, 'src': 'embed', 'start': 36113.733, 'weight': 4, 'content': [{'end': 36119.955, 'text': 'and with a possible long-term shift towards home-based work for a large segment of the population,', 'start': 36113.733, 'duration': 6.222}, {'end': 36124.776, 'text': 'large-scale deployment of the public 5G access points is likely to be curtailed.', 'start': 36119.955, 'duration': 4.821}, {'end': 36134.119, 'text': 'Now the cloud capacity, especially from hyperscale vendors such as AWS, Microsoft Azure, Google Compute, Oracle Cloud and IBM Cloud,', 'start': 36125.197, 'duration': 8.922}, {'end': 36136.1, 'text': 'is what is going to get us through the storm.', 'start': 36134.119, 'duration': 1.981}, {'end': 36138.654, 'text': 'With a remote home-based workforce.', 'start': 36136.772, 'duration': 1.882}], 'summary': 'Potential decrease in public 5g deployment due to shift to home-based work, reliance on cloud capacity for remote workforce', 'duration': 24.921, 'max_score': 36113.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36113733.jpg'}], 'start': 34732.492, 'title': 'Analytics in economic downturns', 'summary': 'Discusses the impact of recessions, analytics in mitigating recession impact, job opportunities in analytics, and the role of analytics in pandemic response, including iot analytics, urban data dashboards, and surge in online activities.', 'chapters': [{'end': 35401.259, 'start': 34732.492, 'title': 'Analytics in recessions', 'summary': 'Discusses the impact of recessions on various sectors, the lessons learned from the great recession, the relevance of analytics skills, and the role of analytics in mitigating the impact of a global recession.', 'duration': 668.767, 'highlights': ['The impact of the ongoing recession on the analytics domain may not be as severe as its impact on various other sectors, and the broader quantifiable impact of the recession on salaries across analytics can only be measured after a few months of the recession. The ongoing recession may have a lesser impact on the analytics domain compared to other sectors, and the precise effect on salaries in analytics will become measurable after a few months.', 'The 2007 financial contagion and the subsequent Great Recession resulted in the loss of tens of millions of jobs, with 7 million in the United States alone, and the recovery from the Great Recession presented challenges for low-skilled workers and those without college credentials. The 2007 financial contagion and the Great Recession led to the loss of tens of millions of jobs, particularly impacting low-skilled workers and those without college credentials.', "Nearly two-thirds of the 8.4 million jobs created during the recovery from the Great Recession required at least a bachelor's degree, and signs of the demand for digital skills appeared early, projecting a shortfall of 200,000 data scientists and 1.5 million analytics-enabled managers. The majority of jobs created during the Great Recession recovery demanded at least a bachelor's degree, and there was an early indication of a demand for digital skills, projecting a significant shortfall in data scientists and analytics-enabled managers.", 'The majority of job openings demanded analytics-enabled professionals, representing nearly 70% of all job postings across various sectors including finance, health, manufacturing, and retail. A majority of job openings required analytics-enabled professionals, accounting for almost 70% of job postings across diverse sectors such as finance, health, manufacturing, and retail.', "Employers identified the relevance and need for analytic skills as crucial, with 94% stating the skills' relevance to current job openings and 89% finding it very problematic to find qualified applicants in this field. Employers emphasized the critical relevance of analytic skills, with 94% acknowledging their importance in current job openings and 89% expressing difficulty in finding qualified applicants in this field."]}, {'end': 35688.451, 'start': 35402.003, 'title': 'Analytics: thriving in economic downturns', 'summary': 'Highlights how analytics helps companies thrive in economic downturns by focusing on customer retention, revenue maximization, expense reduction, and job opportunities in the analytics market, amidst the covid-19 impact on the global economy.', 'duration': 286.448, 'highlights': ['Analytics helps companies thrive in economic downturns by focusing on customer retention and revenue maximization. Identifying customer preferences through data analytics can lead to increased profit from direct-to-customer approach.', 'Expense reduction and revenue maximization are crucial during economic downturns, and analytics tools can help automate tasks and identify new revenue opportunities. Advanced analytics tools can help organizations automate tasks and identify new revenue opportunities, leading to increased customer retention and profitability.', 'The analytics market remains afloat amidst the COVID-19 impact, with growing demands for data analytics and data science jobs across various sectors. The demand for data analytics and data science jobs is evident across sectors like finance, healthcare, information, manufacturing, professional services, and retail trade, amidst the economic impact of COVID-19.']}, {'end': 35931.168, 'start': 35688.451, 'title': 'Impact of data analytics on business and biomedical research', 'summary': 'Highlights the 11% increase in data science and analytics job titles during a time of falling job postings, the significant impact of analytics in business during economic downturns, and the crucial role of data analytics and big data in accelerating biomedical research and the development of a vaccine for the coronavirus.', 'duration': 242.717, 'highlights': ['Data science and analytics job titles were on track to increase by 11% during a time of falling job postings. This indicates the growing demand for data professionals despite economic challenges.', 'Analytics in business can help save time and money, with advanced tools aiding in automation and customer engagement. This demonstrates the tangible benefits of analytics in improving business efficiency and performance.', 'Data analytics played a significant role in insulating organizations from the effects of the economic recession, leading to profitable growth and post-recession recovery. This highlights the resilience and positive impact of data analytics in sustaining businesses during economic downturns.', 'The use of analytics, machine learning, and big data is accelerating the development of a vaccine for the coronavirus and biomedical research in general. This emphasizes the critical role of data analytics and big data in advancing biomedical research and addressing global health challenges.', 'The interdisciplinary nature of biomedical research requires the collaboration of data professionals, biologists, statisticians, and other experts to drive progress. This underscores the essential role of data professionals in interdisciplinary collaboration for impactful biomedical research.']}, {'end': 36203.952, 'start': 35931.168, 'title': 'Leveraging data and technology for pandemic response', 'summary': 'Discusses the utilization of iot analytics, urban data dashboards, social media data science and ai, cloud computing, and the surge in demand for online learning, fitness, and digital consumption in response to the pandemic.', 'duration': 272.784, 'highlights': ['The surge in demand for online learning, fitness, and digital consumption The demand for online learning, fitness, social media and digital consumption has increased exponentially across the board, with video conferencing apps experiencing a 72% increase in time spent and over 100% rise in average user count.', 'Utilization of IoT analytics and machine learning for supply chain management Supply chain managers are turning to new ways of managing the supply chain using IoT analytics and machine learning, which will become the foundation for gaining insight into market trends and erratic supply on demand.', 'Utilizing urban data dashboards for monitoring mobility and social distancing The Newcastle University Urban Observatory has developed an urban data dashboard to monitor movement within a city in real time, using thousands of sensors and data sharing agreements to understand the impact of social distancing measures and monitor various variables, including energy consumption, air quality, and climate.', "Leveraging social media data science and AI to combat the pandemic Data science and AI are being used to reach target audiences, identify at-risk individuals, detect fake news, and analyze people's movement through social media platforms, offering a valuable tool in controlling the pandemic situation.", 'The role of cloud computing in making remote work more convenient Cloud computing, particularly from hyperscale vendors, is crucial for making remote work more convenient, especially with a potential long-term shift towards home-based work, and will be essential for running core business functions and refactoring legacy applications.']}], 'duration': 1471.46, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA34732492.jpg', 'highlights': ['The demand for data analytics and data science jobs is evident across sectors like finance, healthcare, information, manufacturing, professional services, and retail trade, amidst the economic impact of COVID-19.', 'The majority of job openings demanded analytics-enabled professionals, accounting for almost 70% of job postings across diverse sectors such as finance, health, manufacturing, and retail.', 'The use of analytics, machine learning, and big data is accelerating the development of a vaccine for the coronavirus and biomedical research in general.', 'The surge in demand for online learning, fitness, social media and digital consumption has increased exponentially across the board, with video conferencing apps experiencing a 72% increase in time spent and over 100% rise in average user count.', 'The Newcastle University Urban Observatory has developed an urban data dashboard to monitor movement within a city in real time, using thousands of sensors and data sharing agreements to understand the impact of social distancing measures and monitor various variables, including energy consumption, air quality, and climate.']}, {'end': 37770.169, 'segs': [{'end': 36261.033, 'src': 'embed', 'start': 36203.952, 'weight': 0, 'content': [{'end': 36207.554, 'text': 'data science and data analytics will go up tremendously.', 'start': 36203.952, 'duration': 3.602}, {'end': 36211.816, 'text': 'and then we finally have Financial Tech and financial services.', 'start': 36207.554, 'duration': 4.262}, {'end': 36218.119, 'text': 'now banking and financial services are essentially for everyone, with people working from home and maintaining social distancing.', 'start': 36211.816, 'duration': 6.303}, {'end': 36221.921, 'text': 'This would be a great opportunity for FinTech startups.', 'start': 36218.479, 'duration': 3.442}, {'end': 36228.344, 'text': 'online services that we can pay for through FinTech are now practically covering every industry imaginable.', 'start': 36221.921, 'duration': 6.423}, {'end': 36233.132, 'text': 'Now, if I had to talk about the data analytics Trends in a nutshell,', 'start': 36229.15, 'duration': 3.982}, {'end': 36238.273, 'text': "I'd like to say that the crisis is in its early days and it is hard to predict where it will lead.", 'start': 36233.132, 'duration': 5.141}, {'end': 36248.977, 'text': 'However, according to a recent survey report by Birchworks and International Institute for analytics, the scenario for employees in data analytics,', 'start': 36238.834, 'duration': 10.143}, {'end': 36253.259, 'text': 'big data data science and machine learning is still very promising.', 'start': 36248.977, 'duration': 4.282}, {'end': 36261.033, 'text': 'Demand for data-oriented occupations and skill sets skyrocketed in the year 2019 and this year.', 'start': 36253.829, 'duration': 7.204}], 'summary': 'Data analytics and fintech are booming, with a significant rise in demand for data-oriented occupations and skill sets.', 'duration': 57.081, 'max_score': 36203.952, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36203952.jpg'}, {'end': 36397.701, 'src': 'embed', 'start': 36372.131, 'weight': 12, 'content': [{'end': 36376.914, 'text': 'Now, for an entry-level data analyst in India, the average salary ranges from 1,72,000 rupees to 7,16,000 rupees.', 'start': 36372.131, 'duration': 4.783}, {'end': 36392.378, 'text': 'with the bonus of 6,000 to 10,000 and profit sharing of 1,500 to 10K.', 'start': 36389.597, 'duration': 2.781}, {'end': 36394.279, 'text': 'Now the total pay now comes at 1,74,000 or 1,75,000 to 7,53,000.', 'start': 36392.398, 'duration': 1.881}, {'end': 36397.701, 'text': 'That is a very good amount for an entry-level data analyst.', 'start': 36394.279, 'duration': 3.422}], 'summary': 'Entry-level data analyst in india earns 1,72,000 to 7,16,000 rupees, with total pay reaching 1,74,000 to 7,53,000, plus bonuses and profit sharing.', 'duration': 25.57, 'max_score': 36372.131, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36372131.jpg'}, {'end': 36489.313, 'src': 'embed', 'start': 36436.408, 'weight': 9, 'content': [{'end': 36439.171, 'text': 'So in India, the salary ranges from 3,63,000 rupees to 19,70,000 rupees.', 'start': 36436.408, 'duration': 2.763}, {'end': 36442.313, 'text': "That's a heck of an amount you're getting for an experienced data analyst.", 'start': 36439.191, 'duration': 3.122}, {'end': 36456.156, 'text': 'so the bonus ranges from 14,000 rupees to 2 lakh rupees or 2.5 lakh rupees, and the profit is around 12,500 rupees.', 'start': 36447.913, 'duration': 8.243}, {'end': 36462.758, 'text': 'so the total pay varies from 3.7 or 4 lakhs per annum to 20 lakhs per annum.', 'start': 36456.156, 'duration': 6.602}, {'end': 36467.14, 'text': 'so if you have a look at the data analyst for experienced data analyst in the US,', 'start': 36462.758, 'duration': 4.382}, {'end': 36476.586, 'text': 'the salary ranges from $45,000 to 100k and the bonus ranges from $1,000 to $10,000, whereas the profit sharing is $1,000 to $12,000,', 'start': 36467.14, 'duration': 9.446}, {'end': 36484.428, 'text': 'which makes the total pay around $40,000 to $45,000 initially to $100,000..', 'start': 36476.586, 'duration': 7.842}, {'end': 36489.313, 'text': "So let's have a look at the various skills which are required to become a data analyst at number one.", 'start': 36484.429, 'duration': 4.884}], 'summary': "Experienced data analysts in india earn 3.7-20 lakhs inr, while in the us, it's $40,000-$100,000, with varying bonuses and profits.", 'duration': 52.905, 'max_score': 36436.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36436408.jpg'}, {'end': 36593.567, 'src': 'embed', 'start': 36571.75, 'weight': 14, 'content': [{'end': 36582.46, 'text': 'So skills like SQL, JavaScript, Python, using tools like SAS, we have Tableau, Power BI, various visualization tools and all.', 'start': 36571.75, 'duration': 10.71}, {'end': 36587.405, 'text': 'And you also need to have a good knowledge of Hadoop and the various big data ecosystem.', 'start': 36582.6, 'duration': 4.805}, {'end': 36593.567, 'text': "So now let's talk about the different job descriptions or the roles of a data analyst.", 'start': 36587.985, 'duration': 5.582}], 'summary': 'Data analysts need skills in sql, javascript, python, sas, tableau, power bi, hadoop, and big data ecosystem, as well as various visualization tools.', 'duration': 21.817, 'max_score': 36571.75, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36571750.jpg'}, {'end': 36645.869, 'src': 'embed', 'start': 36617.822, 'weight': 15, 'content': [{'end': 36621.124, 'text': 'create and maintain relational databases and data systems.', 'start': 36617.822, 'duration': 3.302}, {'end': 36624.206, 'text': "So let's get into the details of job description.", 'start': 36621.524, 'duration': 2.682}, {'end': 36627.007, 'text': 'What exactly they mean by all these terms.', 'start': 36624.246, 'duration': 2.761}, {'end': 36627.948, 'text': 'So, first of all,', 'start': 36627.107, 'duration': 0.841}, {'end': 36637.879, 'text': 'One of the first and most important roles of a data analyst is to determine the organizational goals that involves working with the ID teams management as well as a data scientist.', 'start': 36628.548, 'duration': 9.331}, {'end': 36645.869, 'text': 'So in turn to get a good knowledge of what the organization is set to do and what will help them to make the business profitable.', 'start': 36638.3, 'duration': 7.569}], 'summary': 'Data analyst role involves working with id teams, management, and data scientists to determine organizational goals and make the business profitable.', 'duration': 28.047, 'max_score': 36617.822, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36617822.jpg'}, {'end': 37239.012, 'src': 'embed', 'start': 37215.382, 'weight': 6, 'content': [{'end': 37222.847, 'text': 'they also design, create and maintain relational databases and data systems, as that also comes as a part of the job.', 'start': 37215.382, 'duration': 7.465}, {'end': 37226.509, 'text': "so let's see in detail what exactly are these roles and responsibilities.", 'start': 37222.847, 'duration': 3.662}, {'end': 37229.67, 'text': 'So, starting with determining organizational goal,', 'start': 37227.069, 'duration': 2.601}, {'end': 37236.932, 'text': 'one of the first and most important role of a data analyst is to determine the organizational code that involves working with IT teams,', 'start': 37229.67, 'duration': 7.262}, {'end': 37239.012, 'text': 'management and data scientist.', 'start': 37236.932, 'duration': 2.08}], 'summary': 'Data analysts determine organizational goals and work with it teams, management, and data scientists to fulfill their roles and responsibilities.', 'duration': 23.63, 'max_score': 37215.382, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA37215382.jpg'}, {'end': 37341.408, 'src': 'embed', 'start': 37315.95, 'weight': 1, 'content': [{'end': 37322.639, 'text': 'It is the process of evaluating data using analytical and logical reasoning to examine each component of the data provided.', 'start': 37315.95, 'duration': 6.689}, {'end': 37326.902, 'text': 'One uses standard statistical tools to analyze and interpret the data.', 'start': 37323.24, 'duration': 3.662}, {'end': 37333.424, 'text': 'Now there are various tools and programming languages used in this analysis, such as we have R, Python,', 'start': 37327.242, 'duration': 6.182}, {'end': 37336.626, 'text': 'SS and we have other visualization tools as well.', 'start': 37333.424, 'duration': 3.202}, {'end': 37341.408, 'text': 'Another major role of any data analyst is pinpointing the trends and patterns.', 'start': 37337.226, 'duration': 4.182}], 'summary': 'Data analysis involves evaluating data using statistical tools, programming languages like r and python, and identifying trends and patterns.', 'duration': 25.458, 'max_score': 37315.95, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA37315950.jpg'}], 'start': 36203.952, 'title': 'Data analytics and fintech trends', 'summary': 'Explores the rising demand for data analytics and fintech, essential skills for a data analyst, roles & responsibilities, and the top 10 data analytics tools. it indicates significant job openings and competitive salaries for data analysts, and highlights the increasing demand for data analysts due to the growing volume of data being generated daily.', 'chapters': [{'end': 36489.313, 'start': 36203.952, 'title': 'Data analytics and fintech trends', 'summary': 'Explores the rising demand for data analytics and fintech in various industries, with a focus on the promising job market for data analysts, indicating significant job openings and competitive salaries in both the us and india.', 'duration': 285.361, 'highlights': ['The job market for data analysts is promising, with over 405,000 job openings in New York alone, and a salary range of $40,000 to $100,000 in the US, and 1,72,000 to 19,70,000 rupees in India, making it a lucrative career choice.', 'The demand for data-oriented occupations and skill sets has skyrocketed, leading to a significant increase in job opportunities for data analysts, big data professionals, and machine learning experts in both 2019 and the current year.', 'Financial Tech and financial services are on the rise, especially with the shift towards remote work and social distancing, presenting a great opportunity for FinTech startups to thrive and cater to the increasing demand for online services across various industries.', 'The average salary for data analysts in the US is $83,878, with a bonus range of $1,000 to $10,000, and profit sharing of $600 to $6,000, while in India, the total pay for entry-level data analysts ranges from 1,74,000 to 7,53,000 rupees, making it a lucrative career choice in both countries.', 'The long-term trend of businesses transitioning their operations to rely more heavily on data-related sciences provides job security for data analysts and data engineers, making it a relatively stable career choice even during economic downturns and pandemics.']}, {'end': 37154.949, 'start': 36489.333, 'title': 'Data analyst skills and job description', 'summary': 'Discusses the essential skills required for a data analyst, including analytical, communication, critical thinking, attention to detail, mathematical, and technical skills, along with the job description, requirements for a resume, and job opportunities at various companies. the transcript also highlights the increasing demand for data analysts due to the growing volume of data being generated daily.', 'duration': 665.616, 'highlights': ['The job openings for data analysts are high in demand, with companies such as Dropbox, Adobe, Walmart, LinkedIn, Red Hat, IBM, Uber, and Chase actively hiring for these roles. Mentions the high demand for data analysts and lists several companies actively hiring for these roles.', 'Data analysts need to have a good understanding of the big data ecosystem, including skills in data warehousing, data cleaning, visualization, and machine learning, which can land them good job opportunities. Emphasizes the importance of technical skills such as understanding the big data ecosystem, data warehousing, data cleaning, visualization, and machine learning for data analysts to secure good job opportunities.', 'The chapter outlines the essential skills for a data analyst, including analytical, communication, critical thinking, attention to detail, mathematical, and technical skills. Summarizes the key skills required for a data analyst, covering analytical, communication, critical thinking, attention to detail, mathematical, and technical skills.', 'Data is increasingly shaping the systems that we interact with every day, generating 2.5 quintillion bytes of data every day, highlighting the significance of data analysts in processing and interpreting such vast amounts of data. Highlights the growing volume of data being generated daily and emphasizes the significance of data analysts in processing and interpreting such vast amounts of data.', 'The transcript provides details on the job description of a data analyst, which includes determining organizational goals, mining, cleaning, and analyzing data, interpreting results, and designing and maintaining relational databases and data systems. Summarizes the job description of a data analyst, covering determining organizational goals, mining, cleaning, analyzing data, interpreting results, and designing and maintaining databases and data systems.']}, {'end': 37430.071, 'start': 37155.504, 'title': 'Roles & responsibilities of a data analyst', 'summary': 'Explains the roles and responsibilities of a data analyst, including determining organizational goals, mining and cleaning data, analyzing trends and patterns, creating reports with clear visualization, and maintaining databases and data systems.', 'duration': 274.567, 'highlights': ['Data analysts spend time finding trends, correlation and patterns in complicated data sets, which helps in predicting business performance and guiding strategic decisions. Data analysts spend significant time in pinpointing trends, correlations, and patterns in complicated data sets, helping in predicting business performance and guiding strategic decisions.', 'A crucial role of a data analyst is to determine the organizational goals, involving working with IT teams, management, and data scientists to understand the business and act accordingly. Determining organizational goals is a crucial role of a data analyst, involving working with IT teams, management, and data scientists to understand the business and act accordingly.', 'Data analysts are responsible for cleaning and pruning data, ensuring that the analysis rests on clean data, which involves removing data that may distort the analysis or standardizing it into a single format. Data analysts are responsible for cleaning and pruning data, ensuring that the analysis rests on clean data, which involves removing data that may distort the analysis or standardizing it into a single format.', 'Data analysts use analytical and logical reasoning to examine each component of the data, employing standard statistical tools to analyze and interpret the data, using various programming languages and visualization tools. Data analysts use analytical and logical reasoning to examine each component of the data, employing standard statistical tools to analyze and interpret the data, using various programming languages and visualization tools.', 'Data analysts create reports with clear visualization, using eye-catchy, high-quality charts and graphs to present their findings in a clear and concise way, which helps companies monitor their online business and make informed decisions. Data analysts create reports with clear visualization, using eye-catchy, high-quality charts and graphs to present their findings in a clear and concise way, which helps companies monitor their online business and make informed decisions.']}, {'end': 37770.169, 'start': 37430.091, 'title': 'Top 10 data analytics tools', 'summary': 'Explores the top 10 data analytics tools, including splunk, talend, clickview, and apache spark, highlighting their key features, industry recognition, and prominent users, with splunk being used by 92 out of the fortune 100 and apache spark being the most active apache project at the moment.', 'duration': 340.078, 'highlights': ['Splunk Splunk, named a visionary in the 2020 magic quadrant for APM by Gartner, offers three products for all divisions, with 92 out of the Fortune 100 using it, and pricing options based on predictive pricing, infrastructure-based pricing, and rapid adoption.', "Talend Talend, recognized as a leader in Cardinal's magic quadrant for data integration tools, offers five products with different functionalities, used by small startups to multinational companies such as Aldo AB, Euronext, and AstraZeneca.", 'ClickView ClickView, named a leader in Gardner magic quadrant 2020 for analytics and bi platforms, provides a variety of products and services for data integration, trusted by more than 50,000 customers worldwide, including Cisco, NHS, KitchenAid, and Samsung.', 'Apache Spark Apache Spark, a cluster computing framework, is the most active Apache project at the moment, with companies like Oracle, Hortonworks, Verizon, and Visa using it for real-time computation of data with ease of use and speed.']}], 'duration': 1566.217, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA36203952.jpg', 'highlights': ['The job market for data analysts is promising, with over 405,000 job openings in New York alone, and a salary range of $40,000 to $100,000 in the US, and 1,72,000 to 19,70,000 rupees in India, making it a lucrative career choice.', 'The demand for data-oriented occupations and skill sets has skyrocketed, leading to a significant increase in job opportunities for data analysts, big data professionals, and machine learning experts in both 2019 and the current year.', 'Financial Tech and financial services are on the rise, especially with the shift towards remote work and social distancing, presenting a great opportunity for FinTech startups to thrive and cater to the increasing demand for online services across various industries.', 'The average salary for data analysts in the US is $83,878, with a bonus range of $1,000 to $10,000, and profit sharing of $600 to $6,000, while in India, the total pay for entry-level data analysts ranges from 1,74,000 to 7,53,000 rupees, making it a lucrative career choice in both countries.', 'The long-term trend of businesses transitioning their operations to rely more heavily on data-related sciences provides job security for data analysts and data engineers, making it a relatively stable career choice even during economic downturns and pandemics.', 'The job openings for data analysts are high in demand, with companies such as Dropbox, Adobe, Walmart, LinkedIn, Red Hat, IBM, Uber, and Chase actively hiring for these roles.', 'Data analysts need to have a good understanding of the big data ecosystem, including skills in data warehousing, data cleaning, visualization, and machine learning, which can land them good job opportunities.', 'The chapter outlines the essential skills for a data analyst, including analytical, communication, critical thinking, attention to detail, mathematical, and technical skills.', 'Data is increasingly shaping the systems that we interact with every day, generating 2.5 quintillion bytes of data every day, highlighting the significance of data analysts in processing and interpreting such vast amounts of data.', 'The transcript provides details on the job description of a data analyst, which includes determining organizational goals, mining, cleaning, and analyzing data, interpreting results, and designing and maintaining relational databases and data systems.', 'Data analysts spend time finding trends, correlation and patterns in complicated data sets, which helps in predicting business performance and guiding strategic decisions.', 'A crucial role of a data analyst is to determine the organizational goals, involving working with IT teams, management, and data scientists to understand the business and act accordingly.', 'Data analysts are responsible for cleaning and pruning data, ensuring that the analysis rests on clean data, which involves removing data that may distort the analysis or standardizing it into a single format.', 'Data analysts use analytical and logical reasoning to examine each component of the data, employing standard statistical tools to analyze and interpret the data, using various programming languages and visualization tools.', 'Data analysts create reports with clear visualization, using eye-catchy, high-quality charts and graphs to present their findings in a clear and concise way, which helps companies monitor their online business and make informed decisions.', 'Splunk, named a visionary in the 2020 magic quadrant for APM by Gartner, offers three products for all divisions, with 92 out of the Fortune 100 using it, and pricing options based on predictive pricing, infrastructure-based pricing, and rapid adoption.', "Talend, recognized as a leader in Cardinal's magic quadrant for data integration tools, offers five products with different functionalities, used by small startups to multinational companies such as Aldo AB, Euronext, and AstraZeneca.", 'ClickView, named a leader in Gardner magic quadrant 2020 for analytics and bi platforms, provides a variety of products and services for data integration, trusted by more than 50,000 customers worldwide, including Cisco, NHS, KitchenAid, and Samsung.', 'Apache Spark, a cluster computing framework, is the most active Apache project at the moment, with companies like Oracle, Hortonworks, Verizon, and Visa using it for real-time computation of data with ease of use and speed.']}, {'end': 39241.239, 'segs': [{'end': 37798.586, 'src': 'embed', 'start': 37770.529, 'weight': 13, 'content': [{'end': 37774.191, 'text': 'The machine learning component is handy when it comes to the big data processing.', 'start': 37770.529, 'duration': 3.662}, {'end': 37777.692, 'text': "Now let's take a look at the next tool, which is power bi.", 'start': 37774.81, 'duration': 2.882}, {'end': 37787.018, 'text': 'so power bi is a Microsoft product used for business analytics, named as a leader for the 13th consecutive year in the Gardner 2020 Magic Quadrant.', 'start': 37777.692, 'duration': 9.326}, {'end': 37792.622, 'text': 'It provides interactive visualizations with self-service business intelligence capabilities,', 'start': 37787.378, 'duration': 5.244}, {'end': 37798.586, 'text': 'where end users can create dashboards and reports by themselves without having to depend on anybody.', 'start': 37792.622, 'duration': 5.964}], 'summary': 'Power bi, a microsoft product, is a leader in business analytics, as named in gardner 2020 magic quadrant for the 13th consecutive year.', 'duration': 28.057, 'max_score': 37770.529, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA37770529.jpg'}, {'end': 37842.602, 'src': 'embed', 'start': 37814.501, 'weight': 12, 'content': [{'end': 37819.543, 'text': 'few of them are free for a certain period of time and then you have to take the license versions.', 'start': 37814.501, 'duration': 5.042}, {'end': 37827.786, 'text': 'multinational organizations such as Adobe, Heathrow, world smart, GE, Healthcare are using power bi to achieve powerful results from their data.', 'start': 37819.543, 'duration': 8.243}, {'end': 37838.017, 'text': 'Power bi has recently come up with solutions such as Azure plus power bi and office 365 plus power bi to help the users analyze the data,', 'start': 37828.386, 'duration': 9.631}, {'end': 37842.602, 'text': 'connect the data and protect the data across other various platforms.', 'start': 37838.017, 'duration': 4.585}], 'summary': 'Power bi is used by multinational organizations like adobe and ge for powerful data analysis. new solutions like azure plus power bi and office 365 plus power bi have been introduced.', 'duration': 28.101, 'max_score': 37814.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA37814501.jpg'}, {'end': 38477.915, 'src': 'embed', 'start': 38451.078, 'weight': 3, 'content': [{'end': 38456.682, 'text': 'Machine learning and predictive modeling are quickly becoming two of the hottest topics in the field of data analysis.', 'start': 38451.078, 'duration': 5.604}, {'end': 38463.988, 'text': 'While not every analyst works with machine learning, the tool and the concepts are important to learn in order to get ahead in the field.', 'start': 38457.243, 'duration': 6.745}, {'end': 38469.373, 'text': "You'll need to have your statistical programming skills down first to advance with machine learning.", 'start': 38464.349, 'duration': 5.024}, {'end': 38477.915, 'text': 'if you want to stand out from other data analysts, you need to know machine learning techniques such as supervised machine learning, decision tree,', 'start': 38469.853, 'duration': 8.062}], 'summary': 'Machine learning and predictive modeling are crucial for data analysts to advance in the field, requiring strong statistical programming skills and knowledge of techniques like supervised machine learning and decision trees.', 'duration': 26.837, 'max_score': 38451.078, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA38451078.jpg'}, {'end': 38576.885, 'src': 'embed', 'start': 38514.006, 'weight': 0, 'content': [{'end': 38520.852, 'text': 'Investigation and verification skills are required to ensure you are providing the right information for the right business problem.', 'start': 38514.006, 'duration': 6.846}, {'end': 38527.457, 'text': 'Interpreting data results involves determining the extent and the cause of any issues involving bad data.', 'start': 38521.392, 'duration': 6.065}, {'end': 38531.941, 'text': 'The more advanced in analytics you get, the more math and statistics you will need.', 'start': 38527.978, 'duration': 3.963}, {'end': 38538.026, 'text': 'So a good analytical problem solving knowledge both practically and theoretically are preferred.', 'start': 38532.541, 'duration': 5.485}, {'end': 38542.111, 'text': 'Now coming to the third point, we have effective communication.', 'start': 38538.729, 'duration': 3.382}, {'end': 38549.657, 'text': "data visualization and presentation skills go hand in hand, but presentation doesn't always come naturally to everyone, and that's okay.", 'start': 38542.111, 'duration': 7.546}, {'end': 38555.861, 'text': 'Even seasoned presenters will feel their nerves at times as with anything else start with practice.', 'start': 38550.077, 'duration': 5.784}, {'end': 38558.922, 'text': 'then practice some more until you get into your groove.', 'start': 38556.321, 'duration': 2.601}, {'end': 38570.324, 'text': 'Companies searching for a strong data analyst are looking for someone who can clearly and fluently translate their technical findings to a non-technical team such as the marketing or the sales department.', 'start': 38559.222, 'duration': 11.102}, {'end': 38576.885, 'text': 'A data analyst must enable the business to make decisions by arming them with quantified insights,', 'start': 38570.764, 'duration': 6.121}], 'summary': 'Data analysts need investigation skills, statistical knowledge, and effective communication for translating technical findings to non-technical teams.', 'duration': 62.879, 'max_score': 38514.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA38514006.jpg'}, {'end': 39021.867, 'src': 'embed', 'start': 38992.433, 'weight': 11, 'content': [{'end': 38995.897, 'text': 'and this cycle continues until a deliverable product is ready.', 'start': 38992.433, 'duration': 3.464}, {'end': 39003.467, 'text': 'So the prototype has been through a few iterations of corrections and improvement and now is finally looking ready.', 'start': 38996.498, 'duration': 6.969}, {'end': 39016.811, 'text': "Now what Stuart does is he analyzes the gathered data of the app and uses data visualization tools like Tableau Power BI to make reports that provide insight on the app's performance.", 'start': 39004.349, 'duration': 12.462}, {'end': 39021.867, 'text': 'These reports could be as simple as charts in a document explaining the data,', 'start': 39017.486, 'duration': 4.381}], 'summary': 'Prototype iterated, data analyzed using tableau power bi for app insights.', 'duration': 29.434, 'max_score': 38992.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA38992433.jpg'}, {'end': 39116.049, 'src': 'embed', 'start': 39085.984, 'weight': 10, 'content': [{'end': 39090.465, 'text': 'Martha, her CTO, and the hospital are all happy with the app.', 'start': 39085.984, 'duration': 4.481}, {'end': 39093.526, 'text': 'It will help their customers and staff immensely.', 'start': 39090.865, 'duration': 2.661}, {'end': 39099.23, 'text': "So let's move on to the next section where we summarize all of the roles and responsibilities of a business analyst.", 'start': 39094.144, 'duration': 5.086}, {'end': 39100.551, 'text': 'And so they are as follows.', 'start': 39099.41, 'duration': 1.141}, {'end': 39110.022, 'text': "Business analyst acts as a liaison for their clients and work with technical teams to deliver products and services that will alleviate the client's problems.", 'start': 39101.232, 'duration': 8.79}, {'end': 39116.049, 'text': 'It helps to think of them as people who bridge the gap between the technical IT people and the client.', 'start': 39110.706, 'duration': 5.343}], 'summary': 'App satisfies hospital and cto, aids customers & staff. business analyst acts as liaison, bridging gap between it and client.', 'duration': 30.065, 'max_score': 39085.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA39085984.jpg'}], 'start': 37770.529, 'title': 'Essential data analysis and business analyst skills', 'summary': 'Provides an overview of popular data analysis tools and emphasizes the essential skills required to become a successful data analyst, including statistics, programming languages like r and python, data warehousing, bi tools, data cleaning, visualization, data visualization, hadoop, machine learning, analytical problem-solving, effective communication, critical thinking, and industry knowledge. it also highlights the demand for business analysts, with 87,000 vacancies in the united states and 4,199 in india, as well as the average base salaries of 7.5 lakh rupees in india and $75,000 in the united states.', 'chapters': [{'end': 38077.471, 'start': 37770.529, 'title': 'Data analysis tools overview', 'summary': 'Provides an overview of popular data analysis tools including power bi, tableau, rapidminer, knime, and microsoft excel, highlighting key features, industry usage, and recent advancements.', 'duration': 306.942, 'highlights': ["Power BI named as a leader for the 13th consecutive year in the Gardner 2020 Magic Quadrant, used by multinational organizations like Adobe, Heathrow, and GE Healthcare. Power BI's leadership position and usage by renowned organizations demonstrate its effectiveness in business analytics.", "KNIME being used by companies such as Siemens, Novartis, and Dush Telecom, enabling data analysis and insights without prior programming knowledge. KNIME's adoption by major companies and its user-friendly interface showcase its accessibility and utility for data analytics.", "RapidMiner named a visionary in 2020 Gardner Magic Quadrant, utilized by companies like BMW, Hewlett Packard Enterprise, and Sanofi for data processing and machine learning models. RapidMiner's recognition as a visionary and its adoption by industry leaders underscores its significance in data processing and machine learning.", "Tableau recognized as a leader in the Gardner Magic Quadrant 2020, used by organizations such as Citibank, Deloitte, Skype, and Audi for data visualization and analysis. Tableau's leadership position and usage by major organizations illustrate its effectiveness in data visualization and analysis.", "Microsoft Excel's popularity and usage by organizations like McDonald's, Ikea, and Marriott highlights its widespread adoption for data analytics. The widespread usage of Microsoft Excel by major organizations emphasizes its importance as a data analytics tool."]}, {'end': 38363.483, 'start': 38077.878, 'title': 'Top skills for data analysts', 'summary': 'Discusses the essential skills required to become a successful data analyst, emphasizing the importance of statistics, programming languages like r and python, microsoft excel, data warehousing and bi tools, and data cleaning and visualization in the field, with a focus on industry relevance and practical applications.', 'duration': 285.605, 'highlights': ['Proficiency in statistics is essential for a data analyst, as it helps in understanding algorithms and data manipulation. A thorough grounding in statistics is essential for a data analyst, helping in understanding algorithms deeply and when to use them. Brushing up on applied science, linear algebra, real analysis, graph theory, and numerical analysis is crucial.', 'Programming skills in languages like R, Python, and SAS are necessary for data manipulation and analysis. Data analysts need proficiency in programming or scripting languages such as R, Python, and SAS, which are used to manipulate data. Wider experience in coding languages adds value to employers.', 'Advanced Microsoft Excel skills are important for data analysis, especially for smaller datasets and quick analysis. Advanced Excel methods are still widely used for smaller datasets and quicker analysis. Learning functions like VLOOKUP, pivot tables, power pivots, and macros is crucial for data analysis.', 'Data warehousing and BI tools are essential for data analysis, involving data management and manipulation skills critical for predicting future behavior. Data analysis and mining are subsets of business intelligence, which incorporate data warehousing, database management systems, and OLAP. Technologies like Hive, R, Scala, and SQL are used for data management and manipulation.', 'Data cleaning is crucial for efficient data analysis, with software tools used to remove duplicate and incorrect data, saving time and increasing efficiency. Data cleansing software tools are used to remove duplicate data, fix and amend badly formatted, incorrect and incomplete data for marketing lists, saving time and increasing efficiency.']}, {'end': 38793.685, 'start': 38364.019, 'title': 'Essential skills for data analysts', 'summary': 'Emphasizes the crucial skills for data analysts, highlighting the importance of data visualization, hadoop, machine learning, analytical problem-solving, effective communication, critical thinking, and industry knowledge.', 'duration': 429.666, 'highlights': ['Data Visualization Critical', 'Machine Learning High', 'Analytical Problem Solving High', 'Effective Communication High', 'Critical Thinking Medium', 'Industry Knowledge Medium']}, {'end': 39241.239, 'start': 38794.13, 'title': 'Business analyst roles & opportunities', 'summary': 'Highlights the roles and responsibilities of a business analyst using the example of stuart and martha, and provides an overview of the demand for business analysts, with 87,000 vacancies in the united states and 4,199 in india, as well as the average base salaries of 7.5 lakh rupees in india and $75,000 in the united states.', 'duration': 447.109, 'highlights': ['The demand for business analysts is high, with 87,000 vacancies in the United States and 4,199 in India, and over 3,500 companies currently hiring business analysts. Quantifiable data: 87,000 vacancies in the United States, 4,199 in India, over 3,500 companies hiring business analysts.', 'The average base salaries for business analysts are about 7.5 lakh rupees in India and about $75,000 in the United States, with potential for higher salaries. Quantifiable data: average base salaries of 7.5 lakh rupees in India and $75,000 in the United States, potential for higher salaries.', 'Roles and responsibilities of a business analyst include acting as a liaison for clients, gathering requirements, and ensuring their correct implementation, as well as possessing hard skills like programming and data analytics, and soft skills like communication and leading teams. Key points: acting as a liaison for clients, gathering requirements, ensuring correct implementation, hard skills like programming and data analytics, soft skills like communication and leading teams.']}], 'duration': 1470.71, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA37770529.jpg', 'highlights': ["Power BI's leadership position and usage by renowned organizations demonstrate its effectiveness in business analytics.", "KNIME's adoption by major companies and its user-friendly interface showcase its accessibility and utility for data analytics.", "RapidMiner's recognition as a visionary and its adoption by industry leaders underscores its significance in data processing and machine learning.", "Tableau's leadership position and usage by major organizations illustrate its effectiveness in data visualization and analysis.", 'The widespread usage of Microsoft Excel by major organizations emphasizes its importance as a data analytics tool.', 'Proficiency in statistics is essential for a data analyst, helping in understanding algorithms deeply and when to use them.', 'Data analysts need proficiency in programming or scripting languages such as R, Python, and SAS, which are used to manipulate data.', 'Advanced Excel methods are still widely used for smaller datasets and quicker analysis.', 'Data analysis and mining are subsets of business intelligence, which incorporate data warehousing, database management systems, and OLAP.', 'Data cleansing software tools are used to remove duplicate data, fix and amend badly formatted, incorrect and incomplete data for marketing lists.', 'Data Visualization Critical', 'Machine Learning High', 'Analytical Problem Solving High', 'Effective Communication High', 'The demand for business analysts is high, with 87,000 vacancies in the United States and 4,199 in India, and over 3,500 companies currently hiring business analysts.', 'Quantifiable data: average base salaries of 7.5 lakh rupees in India and $75,000 in the United States, potential for higher salaries.', 'Key points: acting as a liaison for clients, gathering requirements, ensuring correct implementation, hard skills like programming and data analytics, soft skills like communication and leading teams.']}, {'end': 41909.684, 'segs': [{'end': 39648.492, 'src': 'embed', 'start': 39621.298, 'weight': 31, 'content': [{'end': 39626.083, 'text': 'So this is yet another requirement which we have already covered in our previous Pharmaceutical examples.', 'start': 39621.298, 'duration': 4.785}, {'end': 39630.45, 'text': 'so this is also required as a data analyst in a different environment, I would say.', 'start': 39626.083, 'duration': 4.367}, {'end': 39637.801, 'text': 'And also to use statistical tools to identify, analyze, and interpret patterns and trends in a complex data set.', 'start': 39630.951, 'duration': 6.85}, {'end': 39641.028, 'text': 'Alright, so this is what a data analyst does.', 'start': 39638.607, 'duration': 2.421}, {'end': 39642.669, 'text': 'Now moving on forward.', 'start': 39641.529, 'duration': 1.14}, {'end': 39648.492, 'text': 'this is just, in a nutshell, consolidated responsibilities of a data scientist versus data analyst,', 'start': 39642.669, 'duration': 5.823}], 'summary': 'Data analyst responsibilities include using statistical tools to interpret patterns in data sets.', 'duration': 27.194, 'max_score': 39621.298, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA39621298.jpg'}, {'end': 39741.63, 'src': 'embed', 'start': 39720.854, 'weight': 23, 'content': [{'end': 39730.298, 'text': 'So the main thing is that a data scientist has to uncover hidden opportunities from data by using machine learning tools and data science techniques.', 'start': 39720.854, 'duration': 9.444}, {'end': 39736.504, 'text': 'which we will also study further now, for a data analyst, gathering data from databases,', 'start': 39730.598, 'duration': 5.906}, {'end': 39741.63, 'text': 'maintaining databases and accessing database is the key responsibility.', 'start': 39736.504, 'duration': 5.126}], 'summary': 'Data scientist uncovers opportunities with machine learning; data analyst manages databases.', 'duration': 20.776, 'max_score': 39720.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA39720854.jpg'}, {'end': 39835.315, 'src': 'embed', 'start': 39809.966, 'weight': 24, 'content': [{'end': 39815.931, 'text': 'And this is what we have been discussing in the entire video to uncover the hidden opportunities in the data.', 'start': 39809.966, 'duration': 5.965}, {'end': 39825.319, 'text': 'Data scientists uncover hidden patterns from the data to unleash or uncover the potential business problem that the business might face in future.', 'start': 39816.491, 'duration': 8.828}, {'end': 39826.8, 'text': 'So this is forward looking.', 'start': 39825.639, 'duration': 1.161}, {'end': 39835.315, 'text': 'Whereas the data analyst is concerned with the current business problem day-to-day business problems. that means that the routine business challenges.', 'start': 39827.526, 'duration': 7.789}], 'summary': 'Data scientists uncover hidden patterns to address future business problems, while data analysts focus on current day-to-day challenges.', 'duration': 25.349, 'max_score': 39809.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA39809966.jpg'}, {'end': 40089.229, 'src': 'embed', 'start': 40058.966, 'weight': 32, 'content': [{'end': 40065.693, 'text': 'While a data analyst will not be concerned in putting efforts to explain the data in a form of a story,', 'start': 40058.966, 'duration': 6.727}, {'end': 40073.14, 'text': 'but rather will communicate via presentations and reports using business intelligence tools like Power BI and Tableau.', 'start': 40065.693, 'duration': 7.447}, {'end': 40078.125, 'text': 'Now let us understand the eligibility criteria for a data scientist and a data analyst.', 'start': 40073.461, 'duration': 4.664}, {'end': 40082.064, 'text': 'All in all, these are also the qualifications that companies require.', 'start': 40079.082, 'duration': 2.982}, {'end': 40089.229, 'text': 'Now, this also depends upon the company one applies to, and these are just basic few requirements of a data scientist.', 'start': 40082.464, 'duration': 6.765}], 'summary': 'Data analyst communicates via presentations using tools like power bi and tableau. data scientist qualifications depend on the company.', 'duration': 30.263, 'max_score': 40058.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA40058966.jpg'}, {'end': 40235.999, 'src': 'embed', 'start': 40208.384, 'weight': 29, 'content': [{'end': 40212.366, 'text': 'So like, for example, the responsibility is to write and execute Python scripts.', 'start': 40208.384, 'duration': 3.982}, {'end': 40216.708, 'text': 'The skill required will be strong programming skills like Python in this case.', 'start': 40212.726, 'duration': 3.982}, {'end': 40220.27, 'text': 'Another responsibility is machine learning and NLP techniques.', 'start': 40217.208, 'duration': 3.062}, {'end': 40228.955, 'text': 'Now to bring out order to the unstructured data, the required skill, in which case is data scientist ecosystem or data science ecosystem knowledge,', 'start': 40220.69, 'duration': 8.265}, {'end': 40232.537, 'text': 'like NLP, statistical techniques, neural networks, etc.', 'start': 40228.955, 'duration': 3.582}, {'end': 40235.999, 'text': 'So I can say that the skills must match the responsibility.', 'start': 40232.897, 'duration': 3.102}], 'summary': 'Responsibilities include python scripting and ml/nlp techniques, requiring strong programming skills and data science ecosystem knowledge.', 'duration': 27.615, 'max_score': 40208.384, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA40208384.jpg'}, {'end': 40466.608, 'src': 'embed', 'start': 40441.243, 'weight': 28, 'content': [{'end': 40447.844, 'text': 'and these data reporting tools are really very vast and it depends upon the company which software it uses.', 'start': 40441.243, 'duration': 6.601}, {'end': 40454.386, 'text': 'So I can name just a few reporting tools like SAP, Jira, specific software from Microsoft can be used.', 'start': 40448.325, 'duration': 6.061}, {'end': 40458.846, 'text': 'So again, this depends upon the company, which software the company is using for reporting.', 'start': 40454.406, 'duration': 4.44}, {'end': 40460.267, 'text': 'Programming language.', 'start': 40459.267, 'duration': 1}, {'end': 40466.608, 'text': 'Programming language like Python or R for better analysis, manipulation and visualization of the data.', 'start': 40460.787, 'duration': 5.821}], 'summary': 'Various reporting tools like sap, jira, and microsoft software are used depending on the company, with programming languages like python and r for data analysis.', 'duration': 25.365, 'max_score': 40441.243, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA40441243.jpg'}, {'end': 40591.875, 'src': 'embed', 'start': 40563.925, 'weight': 20, 'content': [{'end': 40566.267, 'text': 'So you might be inclined and interested in that field.', 'start': 40563.925, 'duration': 2.342}, {'end': 40572.569, 'text': 'But keep in mind your education background, skill set and responsibilities that you might need to perform.', 'start': 40566.747, 'duration': 5.822}, {'end': 40582.532, 'text': 'A data analyst, you can always upscale yourself and enter data analyst field with different backgrounds as it is relatively an easier profile to work.', 'start': 40573.089, 'duration': 9.443}, {'end': 40586.713, 'text': 'The qualifications are also not as stringent as for a data scientist.', 'start': 40582.712, 'duration': 4.001}, {'end': 40591.875, 'text': 'Chances are more to make your inroads as a data analyst than as a data scientist.', 'start': 40587.213, 'duration': 4.662}], 'summary': 'Data analyst field is easier to enter with less stringent qualifications, making it more accessible than data scientist roles.', 'duration': 27.95, 'max_score': 40563.925, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA40563925.jpg'}, {'end': 41212.488, 'src': 'embed', 'start': 41150.06, 'weight': 4, 'content': [{'end': 41153.864, 'text': 'The first question over here is how you handle missing values in a data set.', 'start': 41150.06, 'duration': 3.804}, {'end': 41160.791, 'text': "Now if you've handled data before then you know that missing values is one of the most commonly found problems in a data set.", 'start': 41154.444, 'duration': 6.347}, {'end': 41170.349, 'text': 'To handle this The first method is list-wise deletion where an entire record is excluded from analysis if any single value is missing.', 'start': 41161.332, 'duration': 9.017}, {'end': 41179.995, 'text': 'The second method is average imputation also called as mean imputation where you fill in the missing values with the average of that particular value.', 'start': 41171.109, 'duration': 8.886}, {'end': 41186.06, 'text': 'The third method is regression substitution followed by the fourth method which is multiple imputations.', 'start': 41180.456, 'duration': 5.604}, {'end': 41191.794, 'text': 'Now the second question that we have in this section is explain the term normal distribution.', 'start': 41187.028, 'duration': 4.766}, {'end': 41196.379, 'text': 'Normal distribution is one of the most basic concepts in graph theory,', 'start': 41192.374, 'duration': 4.005}, {'end': 41200.804, 'text': 'and it refers to a continuous probability distribution that is symmetric about the mean.', 'start': 41196.379, 'duration': 4.425}, {'end': 41208.964, 'text': 'Some of the important points over here is that you should remember that the mean, median and mode of a normal distribution is equal,', 'start': 41201.638, 'duration': 7.326}, {'end': 41212.488, 'text': 'and all three of them are located in the center of the distribution.', 'start': 41208.964, 'duration': 3.524}], 'summary': 'Handling missing values: list-wise deletion, average imputation, regression substitution, multiple imputations. normal distribution: symmetric, mean=median=mode.', 'duration': 62.428, 'max_score': 41150.06, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA41150060.jpg'}, {'end': 41279.491, 'src': 'embed', 'start': 41238.69, 'weight': 21, 'content': [{'end': 41242.574, 'text': 'This feature distinguishes time series data from cross-sectional data.', 'start': 41238.69, 'duration': 3.884}, {'end': 41246.978, 'text': 'I have added two graphs below for you to understand this concept better.', 'start': 41243.395, 'duration': 3.583}, {'end': 41252.549, 'text': 'The next question is, explain the difference between overfitting and underfitting.', 'start': 41247.967, 'duration': 4.582}, {'end': 41257.812, 'text': 'Now this question is one of the most basic questions that you can find on building models.', 'start': 41253.09, 'duration': 4.722}, {'end': 41265.976, 'text': 'You say that your model is overfitting when it trains well on the training dataset, but the performance drops drastically in the testing dataset.', 'start': 41258.332, 'duration': 7.644}, {'end': 41271.039, 'text': 'This happens when your model learns random fluctuations and noise from your training dataset.', 'start': 41266.477, 'duration': 4.562}, {'end': 41279.491, 'text': 'Underfitting occurs when your model does not really train well on the training dataset and performs poorly on both training and testing dataset.', 'start': 41271.499, 'duration': 7.992}], 'summary': 'Distinguishes time series from cross-sectional data, overfitting vs. underfitting explained.', 'duration': 40.801, 'max_score': 41238.69, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA41238690.jpg'}, {'end': 41325.341, 'src': 'embed', 'start': 41300.083, 'weight': 1, 'content': [{'end': 41305.367, 'text': 'Basically, an outlier is just a data point that is distant from other similar points.', 'start': 41300.083, 'duration': 5.284}, {'end': 41310.971, 'text': 'This may occur due to variability in the measurement or may indicate experimental errors.', 'start': 41305.987, 'duration': 4.984}, {'end': 41319.177, 'text': 'Now, for you to treat outliers, the first thing that you can do is drop outliers, where you can just delete all the records that contain outliers.', 'start': 41311.611, 'duration': 7.566}, {'end': 41322.479, 'text': 'And the second method is capping outliers data.', 'start': 41319.877, 'duration': 2.602}, {'end': 41325.341, 'text': 'The third method is assigning a new value.', 'start': 41322.919, 'duration': 2.422}], 'summary': 'Outliers are distant data points due to variability or errors. treat by dropping, capping, or assigning new value.', 'duration': 25.258, 'max_score': 41300.083, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA41300083.jpg'}, {'end': 41456.838, 'src': 'embed', 'start': 41370.07, 'weight': 0, 'content': [{'end': 41373.031, 'text': 'What is the difference between type one and type two errors?', 'start': 41370.07, 'duration': 2.961}, {'end': 41378.994, 'text': 'The type one error occurs when the null hypothesis is rejected, even if it is true,', 'start': 41373.692, 'duration': 5.302}, {'end': 41384.776, 'text': 'whereas the type two error occurs when the null hypothesis is not rejected, even if it is false.', 'start': 41378.994, 'duration': 5.782}, {'end': 41390.192, 'text': "Now let's end this section over here and head over to the interview questions on Python.", 'start': 41385.629, 'duration': 4.563}, {'end': 41396.135, 'text': 'The first question in this section is what is the correct syntax for reshape function in NumPy?', 'start': 41390.752, 'duration': 5.383}, {'end': 41401.038, 'text': 'For you to use the reshape function, all you need to do is use two parameters.', 'start': 41396.735, 'duration': 4.303}, {'end': 41406.22, 'text': 'The first parameter is the array name and the second parameter is the shape of that array.', 'start': 41401.558, 'duration': 4.662}, {'end': 41410.903, 'text': 'Now, I have attached a snippet that uses the reshape function over here.', 'start': 41406.881, 'duration': 4.022}, {'end': 41414.265, 'text': 'Go through it and type the output in the comment section.', 'start': 41411.583, 'duration': 2.682}, {'end': 41424.445, 'text': 'What are the different ways to create a data frame in pandas? For this question, you can list down the two ways to create data frames in pandas.', 'start': 41415.475, 'duration': 8.97}, {'end': 41430.948, 'text': 'The first method is by initializing list, and the second one is initializing a dictionary.', 'start': 41425.105, 'duration': 5.843}, {'end': 41434.189, 'text': 'For you to create a pandas data frame using lists,', 'start': 41431.288, 'duration': 2.901}, {'end': 41441.373, 'text': 'the first thing that you need to do is type pandas.dataFrame and then give it two parameters data and columns.', 'start': 41434.189, 'duration': 7.184}, {'end': 41445.654, 'text': 'Now, the data parameter consists of a list containing different lists.', 'start': 41441.773, 'duration': 3.881}, {'end': 41448.376, 'text': 'This will later be converted into your records.', 'start': 41446.215, 'duration': 2.161}, {'end': 41452.557, 'text': 'and the second parameter columns will consist of your column names.', 'start': 41449.116, 'duration': 3.441}, {'end': 41456.838, 'text': 'Initializing a pandas dataframe using dictionary is even simpler.', 'start': 41453.417, 'duration': 3.421}], 'summary': 'Explains type one and type two errors, introduces python interview questions on reshape function in numpy and creating data frame in pandas.', 'duration': 86.768, 'max_score': 41370.07, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA41370070.jpg'}, {'end': 41508.153, 'src': 'embed', 'start': 41483.047, 'weight': 10, 'content': [{'end': 41490.394, 'text': 'The first way is the concatenate method where you will type numpy.concatenate A and B and then mention the axis.', 'start': 41483.047, 'duration': 7.347}, {'end': 41493.624, 'text': 'The second method is the headstack method.', 'start': 41491.243, 'duration': 2.381}, {'end': 41501.069, 'text': 'This is another NumPy method where in the parameters, you will provide a list with the values A, B.', 'start': 41494.005, 'duration': 7.064}, {'end': 41508.153, 'text': "Next question is how can you add a column to a Panda's data frame? Doing this is extremely simple.", 'start': 41501.069, 'duration': 7.084}], 'summary': 'Numpy offers two methods for combining arrays: concatenate and headstack.', 'duration': 25.106, 'max_score': 41483.047, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA41483047.jpg'}, {'end': 41559.321, 'src': 'embed', 'start': 41532.924, 'weight': 14, 'content': [{'end': 41537.427, 'text': "I have used numpy.random.randent and I've given a few parameters.", 'start': 41532.924, 'duration': 4.503}, {'end': 41540.669, 'text': 'This random function can help you generate your numbers.', 'start': 41537.967, 'duration': 2.702}, {'end': 41548.755, 'text': 'Now, the random function itself has three parameters, right? In this method, the first value represents the start point.', 'start': 41541.471, 'duration': 7.284}, {'end': 41552.877, 'text': "You won't find any numbers being generated before this point.", 'start': 41549.496, 'duration': 3.381}, {'end': 41555.479, 'text': 'The second value is the end point.', 'start': 41553.458, 'duration': 2.021}, {'end': 41559.321, 'text': "You won't find any values being generated after this number.", 'start': 41555.999, 'duration': 3.322}], 'summary': 'Numpy.random.randent generates numbers between specified start and end points.', 'duration': 26.397, 'max_score': 41532.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA41532924.jpg'}], 'start': 39242.935, 'title': 'Data science and analytics', 'summary': 'Explains the distinction between data scientists and data analysts, their qualifications, skills, insights, and the importance of eda and analytics techniques, including interview questions for data analysts.', 'chapters': [{'end': 40035.098, 'start': 39242.935, 'title': 'Data scientist vs data analyst', 'summary': "Explains the distinction between a data scientist and a data analyst, emphasizing their roles, responsibilities, and key differences, such as a data scientist's focus on future business problems and the use of machine learning, while a data analyst deals with current business problems and focuses on statistical analysis and reporting.", 'duration': 792.163, 'highlights': ['The responsibilities of a data scientist include building and testing machine learning models, working with diverse data sets, and uncovering hidden patterns to solve potential business problems, requiring a data-driven approach to uncover hidden opportunities from the data.', "A data analyst's responsibilities involve gathering and maintaining databases, statistical analysis and reporting, creating reports and dashboards for management, and collaborating with stakeholders and vendors, emphasizing the need for SQL knowledge and strong communication skills.", 'A data scientist focuses on potential business problems by generating questions and finding answers using predictive analysis and machine learning models, while a data analyst deals with current business problems by finding answers to given questions and using visualization and statistical tools for analysis.', 'A data scientist conducts A/B experiments, iteratively performing various experiments and improving them using ML algorithms, while a data analyst focuses on creating reports, dashboards, and presentations to aid decision-making, without the need for machine learning model iterations.']}, {'end': 40460.267, 'start': 40035.458, 'title': 'Data scientist vs data analyst: skills and qualifications', 'summary': 'Discusses the qualifications and key skills required for data scientists and data analysts, emphasizing the importance of storytelling for data scientists, the eligibility criteria for both roles, and the contrasting skill sets required for each position, with a focus on programming languages, machine learning, statistics, and visualization tools.', 'duration': 424.809, 'highlights': ['Data scientists communicate findings through storytelling aided by visuals, while data analysts use presentations and reports with business intelligence tools like Power BI and Tableau. The chapter emphasizes the importance of storytelling for data scientists and the use of business intelligence tools for data analysts.', 'Eligibility criteria for a data scientist include graduation in science, technology, engineering, and mathematics, with a preference for post-graduation and PhD, along with specialization in domains like mathematics, economics, statistics, and computers. The eligibility criteria for data scientists is detailed and includes specific educational qualifications and specialization preferences.', 'Key skills for a data scientist include programming languages like Python, knowledge of machine learning, NLP, big data technologies, statistics, and data mining techniques. The key skills required for data scientists are outlined, encompassing programming languages, machine learning, statistical knowledge, and data mining techniques.', 'Data analysts require strong knowledge of statistics, visualization tools like Power BI and Tableau, Excel, databases, and reporting tools such as SAP and Jira. The essential skills for data analysts are highlighted, including statistics, visualization tools, database knowledge, and reporting tools.']}, {'end': 40868.198, 'start': 40460.787, 'title': 'Data science and data analytics insights', 'summary': 'Covers the essential skills and salary estimates for data scientists and data analysts, along with common interview questions and necessary tools for a data analyst, concluding with guidance on choosing the right career path based on personal interest and qualifications.', 'duration': 407.411, 'highlights': ['Salary Estimates for Data Scientists and Analysts The salary estimates for data scientists in India range from 6 lakhs per annum for beginners to 20 lakhs per annum for experts, while for data analysts, it ranges from 3 lakhs per annum for beginners to 10 lakhs per annum for experts. In the US, data scientists can earn from $95,000 at the beginner level to $165,000 - $250,000 at the expert level, while data analysts can earn from $45,000 at the beginner level to $85,000 at the expert level.', 'Data Science vs Data Analytics Data mining involves identifying patterns and relationships in large datasets to solve business problems, while data profiling assesses the uniqueness, logic, and consistency of a dataset. Data wrangling is the process of transforming and mapping data from one form to another, making it more valuable for analytics.', 'Steps in a Data Analytics Project The steps in a data analytics project include understanding the problem, collecting data from various sources, cleaning the data, exploring and analyzing it, and interpreting the results to find hidden patterns and future trends.', 'Common Problems for Data Analysts Data analysts commonly face challenges such as handling redundant data, collecting and storing data securely, and addressing compliance issues.', 'Common Tools for Data Analysts Data analysts are expected to be familiar with database systems like MySQL and MongoDB, reporting and dashboard tools like Excel and Tableau, programming languages such as Python and R, and presentation software like PowerPoint and Keynote.']}, {'end': 41390.192, 'start': 40868.918, 'title': 'Importance of eda and analytics techniques', 'summary': 'Emphasizes the significance of exploratory data analysis (eda) in understanding data, decision-making, and model selection, and provides insights into descriptive, predictive, and prescriptive analytics. it also covers sampling techniques, univariate, bivariate, and multivariate analysis, methods for data cleaning, handling missing values, normal distribution, time series analysis, overfitting and underfitting, treating outliers, hypothesis testing, and type one and type two errors.', 'duration': 521.274, 'highlights': ['Exploratory data analysis (EDA) provides a better understanding of data, leading to more confidence in decisions and refinement of feature selection during modeling. EDA can help refine feature selection during modeling and lead to more confidence in decisions.', 'Descriptive analytics provides insights into past data, while predictive analytics focuses on predicting the future based on patterns in available data. Descriptive analytics helps understand past data, while predictive analytics predicts the future based on patterns.', 'Different types of sampling techniques include simple random sampling, systematic sampling, cluster sampling, stratified sampling, and judgmental sampling. Sampling techniques include simple random, systematic, cluster, stratified, and judgmental sampling.', 'Univariate analysis deals with data containing only one variable, while bivariate analysis involves comparing two variables, and multivariate analysis involves analyzing three or more variables. Univariate analysis deals with one variable, bivariate involves two, and multivariate involves three or more variables.', 'Data cleaning involves creating a plan, removing duplicates, focusing on data accuracy, normalizing data, and standardizing information for effective analysis. Data cleaning involves creating a plan, removing duplicates, focusing on accuracy, and standardizing information.', 'Handling missing values can be done through list-wise deletion, average imputation, regression substitution, and multiple imputations. Handling missing values involves list-wise deletion, average imputation, regression substitution, and multiple imputations.', 'Normal distribution is a symmetric continuous probability distribution where the mean, median, and mode are equal and located in the center of the distribution. Normal distribution is symmetric, with the mean, median, and mode equal and located in the center.', 'Overfitting occurs when a model performs well on the training dataset but poorly on the testing dataset, while underfitting occurs when a model performs poorly on both datasets due to small training dataset size or mismatched model complexity. Overfitting occurs when a model performs well on training but poorly on testing, while underfitting occurs due to small dataset size or mismatched model complexity.', 'Outliers can be treated by dropping them, capping their values, assigning new values, or trying a new transformation. Outliers can be treated by dropping, capping, assigning new values, or trying a new transformation.', 'Hypothesis testing involves accepting or rejecting a statistical hypothesis, with null and alternative hypotheses used, and it can lead to type one and type two errors. Hypothesis testing involves accepting or rejecting a statistical hypothesis, leading to type one and type two errors.']}, {'end': 41909.684, 'start': 41390.752, 'title': 'Data analyst interview questions', 'summary': 'Covers essential topics like reshaping arrays in numpy, creating data frames in pandas, stacking arrays horizontally, adding columns to a data frame, generating random integers using numpy, indexing arrays, selecting specific columns from a data frame in python, and sql concepts such as where and having clauses, subqueries, delete vs truncate statements, and query optimization.', 'duration': 518.932, 'highlights': ['Creating Data Frames in Pandas To create a pandas data frame, you can initialize it with lists or dictionaries, specifying the data and column names, offering flexibility and ease of use.', 'SQL Where and Having Clauses The where clause filters individual rows, while the having clause filters aggregated data, with the former used before groupings and the latter allowing aggregate functions.', 'Query Optimization Query optimization aims to enhance query efficiency, generating outputs faster, reducing time and space complexity, and executing a large number of queries in less time.', 'Reshaping Arrays in NumPy Reshaping arrays in NumPy involves using the reshape function with two parameters, the array name and its desired shape, offering a fundamental operation for data manipulation.', 'Stacking Arrays Horizontally in NumPy Stacking arrays horizontally in NumPy can be achieved through methods like concatenate and hstack, providing options for combining arrays based on specific requirements.']}], 'duration': 2666.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Y6vY-fJbwUA/pics/Y6vY-fJbwUA39242935.jpg', 'highlights': ['Data scientists build and test ML models, uncover hidden patterns, and solve business problems.', 'Data analysts gather and maintain databases, perform statistical analysis, and create reports.', 'Data scientists focus on potential business problems using predictive analysis and ML models.', 'Data analysts deal with current business problems using visualization and statistical tools.', 'Data scientists communicate findings through storytelling aided by visuals.', 'Data analysts use presentations and reports with business intelligence tools like Power BI and Tableau.', 'Eligibility criteria for a data scientist include graduation in STEM fields and specialization in domains.', 'Key skills for a data scientist include programming languages like Python, machine learning, and statistics.', 'Data analysts require strong knowledge of statistics, visualization tools, Excel, databases, and reporting tools.', 'Salary estimates for data scientists in India range from 6 to 20 lakhs per annum.', 'Salary estimates for data analysts in India range from 3 to 10 lakhs per annum.', 'In the US, data scientists can earn from $95,000 to $250,000, while data analysts can earn from $45,000 to $85,000.', 'Data mining involves identifying patterns and relationships in large datasets to solve business problems.', 'Data profiling assesses the uniqueness, logic, and consistency of a dataset.', 'Data wrangling is the process of transforming and mapping data from one form to another.', 'Steps in a data analytics project include understanding the problem, collecting data, cleaning, exploring, and interpreting results.', 'Common problems for data analysts include handling redundant data, collecting and storing data securely, and addressing compliance issues.', 'Common tools for data analysts include database systems like MySQL and MongoDB, reporting and dashboard tools like Excel and Tableau, and programming languages such as Python and R.', 'EDA provides a better understanding of data, leading to more confidence in decisions and refinement of feature selection during modeling.', 'Descriptive analytics provides insights into past data, while predictive analytics focuses on predicting the future based on patterns.', 'Sampling techniques include simple random, systematic, cluster, stratified, and judgmental sampling.', 'Univariate analysis deals with one variable, bivariate involves two, and multivariate involves three or more variables.', 'Data cleaning involves creating a plan, removing duplicates, focusing on accuracy, and standardizing information.', 'Handling missing values involves list-wise deletion, average imputation, regression substitution, and multiple imputations.', 'Normal distribution is symmetric, with the mean, median, and mode equal and located in the center.', 'Overfitting occurs when a model performs well on training but poorly on testing, while underfitting occurs due to small dataset size or mismatched model complexity.', 'Outliers can be treated by dropping, capping, assigning new values, or trying a new transformation.', 'Hypothesis testing involves accepting or rejecting a statistical hypothesis, leading to type one and type two errors.', 'To create a pandas data frame, you can initialize it with lists or dictionaries, specifying the data and column names.', 'The where clause filters individual rows, while the having clause filters aggregated data.', 'Query optimization aims to enhance query efficiency, reducing time and space complexity.', 'Reshaping arrays in NumPy involves using the reshape function with two parameters, the array name and its desired shape.', 'Stacking arrays horizontally in NumPy can be achieved through methods like concatenate and hstack.']}], 'highlights': ['The demand for data analytics professionals is expected to grow at a rate of 18% in the next few years.', 'The comprehensive curriculum includes essential topics like statistics, SQL, Python, and practical projects, providing a well-rounded understanding of data analytics.', 'Over 35,000 job vacancies in India and over 174,000 in the US for data analysts, with high demand in regions like Bangalore and California.', 'The average annual salary for a data analyst is 5 lakh in India and $69,517 in the US.', 'Data visualization enables decision makers to grasp difficult concepts and identify new patterns, aiding in informed decision making.', 'The process of cleaning data involves handling null values by replacing them with appropriate values, dropping redundant columns, and analyzing the relationship between different variables.', 'The job market for data analysts is promising, with over 405,000 job openings in New York alone, and a salary range of $40,000 to $100,000 in the US, and 1,72,000 to 19,70,000 rupees in India, making it a lucrative career choice.', 'Data scientists build and test ML models, uncover hidden patterns, and solve business problems.', 'Data analysts gather and maintain databases, perform statistical analysis, and create reports.']}