title
What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka

description
( Data Science Training - https://www.edureka.co/data-science-r-programming-certification-course ) This Edureka Data Science course video (Data Science Blog Series: https://goo.gl/yGjZfs) will take you through the need of data science, what is data science, data science use cases for business, BI vs data science, data analytics tools, data science lifecycle along with a demo. This Data Science tutorial video is ideal for beginners to learn data science and machine learning basics. You can read the blog here: https://goo.gl/lYb5Lb Subscribe to our channel to get video updates. Hit the subscribe button above. Check our complete Data Science playlist here: https://goo.gl/60NJJS #whatisdatascience #Datasciencetutorial #Datasciencecourse #datascience How it Works? 1. There will be 30 hours of instructor-led interactive online classes, 40 hours of assignments and 20 hours of project 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. You will get Lifetime Access to the recordings in the LMS. 4. At the end of the training you will have to complete the project based on which we will provide you a Verifiable Certificate! - - - - - - - - - - - - - - About the Course Edureka's Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities. - - - - - - - - - - - - - - Why Learn Data Science? Data Science training certifies you with ‘in demand’ Big Data Technologies to help you grab the top paying Data Science job title with Big Data skills and expertise in R programming, Machine Learning and Hadoop framework. After the completion of the Data Science course, you should be able to: 1. Gain insight into the 'Roles' played by a Data Scientist 2. Analyse Big Data using R, Hadoop and Machine Learning 3. Understand the Data Analysis Life Cycle 4. Work with different data formats like XML, CSV and SAS, SPSS, etc. 5. Learn tools and techniques for data transformation 6. Understand Data Mining techniques and their implementation 7. Analyse data using machine learning algorithms in R 8. Work with Hadoop Mappers and Reducers to analyze data 9. Implement various Machine Learning Algorithms in Apache Mahout 10. Gain insight into data visualization and optimization techniques 11. Explore the parallel processing feature in R - - - - - - - - - - - - - - Who should go for this course? The course is designed for all those who want to learn machine learning techniques with implementation in R language, and wish to apply these techniques on Big Data. The following professionals can go for this course: 1. Developers aspiring to be a 'Data Scientist' 2. Analytics Managers who are leading a team of analysts 3. SAS/SPSS Professionals looking to gain understanding in Big Data Analytics 4. Business Analysts who want to understand Machine Learning (ML) Techniques 5. Information Architects who want to gain expertise in Predictive Analytics 6. 'R' professionals who want to captivate and analyze Big Data 7. Hadoop Professionals who want to learn R and ML techniques 8. Analysts wanting to understand Data Science methodologies For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll free). Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Customer Reviews: Gnana Sekhar Vangara, Technology Lead at WellsFargo.com, says, "Edureka Data science course provided me a very good mixture of theoretical and practical training. The training course helped me in all areas that I was previously unclear about, especially concepts like Machine learning and Mahout. The training was very informative and practical. LMS pre recorded sessions and assignmemts were very good as there is a lot of information in them that will help me in my job. The trainer was able to explain difficult to understand subjects in simple terms. Edureka is my teaching GURU now...Thanks EDUREKA and all the best. "

detail
{'title': 'What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka', 'heatmap': [{'end': 2766.172, 'start': 2719.42, 'weight': 0.703}, {'end': 3145.437, 'start': 2913.581, 'weight': 0.931}, {'end': 3330.92, 'start': 3254.34, 'weight': 0.751}], 'summary': 'Provides an in-depth exploration of data science essentials, its impact on sales analysis, applications in various domains, retail chain analysis, the role of domain expertise and data scientist responsibilities, contrasting business intelligence with data science, utilization of tableau in data visualization, and the model building process, featuring expert insights and real-world examples.', 'chapters': [{'end': 373.858, 'segs': [{'end': 64.943, 'src': 'embed', 'start': 17.845, 'weight': 2, 'content': [{'end': 24.99, 'text': 'I am a post graduate in computer science and I have also done my masters in business analytics from Great Lakes.', 'start': 17.845, 'duration': 7.145}, {'end': 31.514, 'text': "I have been working in data science field for last five to six years and I've worked on varied projects,", 'start': 25.53, 'duration': 5.984}, {'end': 36.158, 'text': 'mostly from health science domain and also in the retail domain.', 'start': 31.514, 'duration': 4.644}, {'end': 42.562, 'text': 'My experience has been into a lot of predictive analytics, predicting what is the best policies,', 'start': 36.418, 'duration': 6.144}, {'end': 46.785, 'text': 'what are the best insurance policies for the users in US market.', 'start': 42.562, 'duration': 4.223}, {'end': 55.29, 'text': 'some approaches that retail chains can take to maximize their business, and I have also been into a lot into training,', 'start': 47.225, 'duration': 8.065}, {'end': 62.614, 'text': 'into teaching data science courses, and I have been associated with different institutes, very renowned institutes, as guest faculty.', 'start': 55.29, 'duration': 7.324}, {'end': 64.943, 'text': 'So this was about me.', 'start': 63.502, 'duration': 1.441}], 'summary': 'Experienced data scientist with 5-6 years, specializing in health science and retail domains, and teaching at renowned institutes.', 'duration': 47.098, 'max_score': 17.845, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA17845.jpg'}, {'end': 128.372, 'src': 'embed', 'start': 84.199, 'weight': 1, 'content': [{'end': 90.905, 'text': "Just ask me a question whenever you feel the need or if you don't understand anything that I mentioned so I'll be happy to repeat it.", 'start': 84.199, 'duration': 6.706}, {'end': 94.289, 'text': "So we'll start with this session now.", 'start': 92.048, 'duration': 2.241}, {'end': 99.092, 'text': 'The first thing that we have at our hand is what is the agenda for today.', 'start': 94.97, 'duration': 4.122}, {'end': 103.174, 'text': 'So we will see in the first part that what is the need for data science?', 'start': 99.192, 'duration': 3.982}, {'end': 106.616, 'text': 'why are we actually involved in data science?', 'start': 103.174, 'duration': 3.442}, {'end': 108.597, 'text': 'why do we need to study data science?', 'start': 106.616, 'duration': 1.981}, {'end': 111.258, 'text': 'Then second we will see what is data science.', 'start': 108.937, 'duration': 2.321}, {'end': 114.66, 'text': "So in first part we'll cover the need, where does it come from.", 'start': 111.298, 'duration': 3.362}, {'end': 120.225, 'text': 'and then what actually it is, how does it solve the problems that we have?', 'start': 115.2, 'duration': 5.025}, {'end': 128.372, 'text': 'and we will then see some use cases of data science, how it is used in the industry, what kinds of problems we actually solve using data science.', 'start': 120.225, 'duration': 8.147}], 'summary': 'Introduction to data science: discussing its need, definition, and industry applications.', 'duration': 44.173, 'max_score': 84.199, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA84199.jpg'}, {'end': 192.215, 'src': 'embed', 'start': 149.994, 'weight': 0, 'content': [{'end': 156.761, 'text': 'then we will also see some of the tools that are used in data science, some technologies that data science employs,', 'start': 149.994, 'duration': 6.767}, {'end': 159.624, 'text': 'and what do we need to learn to actually master data science.', 'start': 156.761, 'duration': 2.863}, {'end': 168.549, 'text': 'and finally we will see that what is the life cycle of a data science project or what is the approach that we follow in solving any problem using data science?', 'start': 159.904, 'duration': 8.645}, {'end': 172.091, 'text': 'So this is the agenda for today and, as I said,', 'start': 169.049, 'duration': 3.042}, {'end': 178.794, 'text': 'we will take questions as we go and any point of time you can interrupt me and if you have any queries,', 'start': 172.091, 'duration': 6.703}, {'end': 183.497, 'text': 'I will be happy to take them up during the session and as well as at the end of the session.', 'start': 178.794, 'duration': 4.703}, {'end': 186.611, 'text': 'so what is the need of the data science?', 'start': 184.75, 'duration': 1.861}, {'end': 192.215, 'text': 'this is a very important graphic that shows what actually led us to data science.', 'start': 186.611, 'duration': 5.604}], 'summary': 'Overview of data science tools, technologies, and lifecycle. importance of data science highlighted.', 'duration': 42.221, 'max_score': 149.994, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA149994.jpg'}, {'end': 332.291, 'src': 'embed', 'start': 307.6, 'weight': 9, 'content': [{'end': 315.584, 'text': 'now, what data science has enabled us to do is that we are moving from an approach where we had reports that told us what is the situation.', 'start': 307.6, 'duration': 7.984}, {'end': 321.266, 'text': 'the picture of the present condition looks like to more of a decision making and predictive approach.', 'start': 315.584, 'duration': 5.682}, {'end': 326.929, 'text': 'So now we are actually moving from an approach where we could see the actual status.', 'start': 321.426, 'duration': 5.503}, {'end': 329.27, 'text': 'what is the current status versus?', 'start': 326.929, 'duration': 2.341}, {'end': 332.291, 'text': 'what actions do we need to take moving forward,', 'start': 329.27, 'duration': 3.021}], 'summary': 'Data science enables shifting from descriptive to predictive approach for decision making.', 'duration': 24.691, 'max_score': 307.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA307600.jpg'}], 'start': 0.029, 'title': 'Data science essentials', 'summary': 'Introduces pankaj gupta, a data science expert with 13 years of experience, emphasizing the significance of data science and predictive analytics. it covers data science, its use cases, comparison with business intelligence, tools and technologies used, and the data science project life cycle. additionally, it discusses how data science enables decision-making and predictive approaches through insights from data.', 'chapters': [{'end': 106.616, 'start': 0.029, 'title': 'Introduction to data science', 'summary': "Introduces pankaj gupta, a data science expert with 13 years of experience, and outlines the agenda for the session, emphasizing the significance of data science and the speaker's expertise in predictive analytics and retail domain.", 'duration': 106.587, 'highlights': ['Pankaj Gupta has 13 years of experience in IT and business analytics domain, with a focus on health science and retail projects.', 'He holds a post graduate degree in computer science and a masters in business analytics from Great Lakes.', 'Pankaj Gupta has extensive experience in predictive analytics, including identifying the best insurance policies for US market users and advising retail chains on maximizing their business.', 'He has been involved in training and teaching data science courses at renowned institutes as a guest faculty.']}, {'end': 232.106, 'start': 106.616, 'title': 'Introduction to data science', 'summary': 'Covers the need for studying data science, an explanation of data science, its use cases, a comparison with business intelligence, tools and technologies used, and the data science project life cycle.', 'duration': 125.49, 'highlights': ['The need for data science is driven by the rapid creation of data in both structured and unstructured forms, such as real-time traffic data captured by satellites and transmitted to handheld devices.', 'Data science complements and extends the capabilities of business intelligence, taking it one step further.', 'The chapter compares business intelligence and data science, highlighting how the latter is complementing and extending the capabilities of the former.', 'The chapter discusses the rapid creation of data, including real-time traffic data captured by satellites and transmitted to handheld devices, emphasizing the need for data science in handling this rapid data creation.']}, {'end': 373.858, 'start': 232.106, 'title': 'Data science for decision making', 'summary': 'Discusses how data science enables decision-making and predictive approaches through insights from data, such as traffic analysis, resource utilization, and sales forecasting.', 'duration': 141.752, 'highlights': ['Data science enables decision-making and predictive approaches', 'Traffic analysis and route suggestions', 'Sales forecasting and pattern discovery', 'Lack of scientific insights in utilizing data']}], 'duration': 373.829, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA29.jpg', 'highlights': ['Pankaj Gupta has 13 years of experience in IT and business analytics domain, with a focus on health science and retail projects.', 'The need for data science is driven by the rapid creation of data in both structured and unstructured forms.', 'Data science enables decision-making and predictive approaches.', 'He holds a post graduate degree in computer science and a masters in business analytics from Great Lakes.', 'Data science complements and extends the capabilities of business intelligence, taking it one step further.', 'Pankaj Gupta has extensive experience in predictive analytics, including identifying the best insurance policies for US market users and advising retail chains on maximizing their business.', 'The chapter compares business intelligence and data science, highlighting how the latter is complementing and extending the capabilities of the former.', 'The chapter discusses the rapid creation of data, emphasizing the need for data science in handling this rapid data creation.', 'He has been involved in training and teaching data science courses at renowned institutes as a guest faculty.', 'Traffic analysis and route suggestions', 'Sales forecasting and pattern discovery', 'Lack of scientific insights in utilizing data']}, {'end': 952.383, 'segs': [{'end': 405.044, 'src': 'embed', 'start': 373.858, 'weight': 10, 'content': [{'end': 378.759, 'text': 'and a secondary reason can also be because there is weather changing, there is change of season,', 'start': 373.858, 'duration': 4.901}, {'end': 381.64, 'text': 'so there are lot of sales happen during that kind of season.', 'start': 378.759, 'duration': 2.881}, {'end': 387.262, 'text': 'So we are able to actually draw a lot of patterns from data and do those discovery.', 'start': 382.08, 'duration': 5.182}, {'end': 389.923, 'text': 'This is what leads us to data science.', 'start': 387.862, 'duration': 2.061}, {'end': 397.886, 'text': 'So this is where we actually draw a need of data science that how do we actually analyze all of those data?', 'start': 390.363, 'duration': 7.523}, {'end': 405.044, 'text': 'how do we find out what are the hidden patterns in the data and what predictions do we make based on the data?', 'start': 397.886, 'duration': 7.158}], 'summary': 'Weather changes drive seasonal sales, leading to data science need.', 'duration': 31.186, 'max_score': 373.858, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA373858.jpg'}, {'end': 559.412, 'src': 'embed', 'start': 532.738, 'weight': 4, 'content': [{'end': 546.921, 'text': 'if I purchased pair of shoes from adidas and I did not like it and on twitter I just rambled my emotions and I said that I purchased a pair of shoes from adidas and they are of very poor quality,', 'start': 532.738, 'duration': 14.183}, {'end': 548.361, 'text': 'and I put a hashtag there.', 'start': 546.921, 'duration': 1.44}, {'end': 555.568, 'text': 'So this becomes my unstructured data, because there is no definite format of this data.', 'start': 548.901, 'duration': 6.667}, {'end': 559.412, 'text': 'I can post any data in any format whichever I choose.', 'start': 555.948, 'duration': 3.464}], 'summary': 'Customer expresses dissatisfaction with adidas shoes on twitter.', 'duration': 26.674, 'max_score': 532.738, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA532738.jpg'}, {'end': 641.346, 'src': 'embed', 'start': 582.304, 'weight': 0, 'content': [{'end': 584.386, 'text': 'So big data has become a term in itself.', 'start': 582.304, 'duration': 2.082}, {'end': 592.275, 'text': 'So earlier we used to have 1 GB or few MBs of data available that could easily be stored on a simple hard drive.', 'start': 584.527, 'duration': 7.748}, {'end': 599.003, 'text': 'But now we have humongous amount of data which cannot be stored on a single secondary storage device.', 'start': 592.856, 'duration': 6.147}, {'end': 602.747, 'text': 'It has to be distributed, it has to be stored in multiple computers.', 'start': 599.263, 'duration': 3.484}, {'end': 612.57, 'text': 'It has to be stored on cloud and it has to be accessed in ways which were not in faster ways that our traditional data access systems do not allow.', 'start': 603.187, 'duration': 9.383}, {'end': 615.611, 'text': 'Then we apply a lot of data science algorithms.', 'start': 612.83, 'duration': 2.781}, {'end': 620.052, 'text': 'We just do not do a simple reporting as we used to do in traditional BI.', 'start': 615.791, 'duration': 4.261}, {'end': 622.833, 'text': 'Now we are doing a lot of data visualization.', 'start': 620.272, 'duration': 2.561}, {'end': 625.054, 'text': 'We are seeing what data speaks.', 'start': 623.093, 'duration': 1.961}, {'end': 631.836, 'text': 'We are looking into the stories that the data tells us and then we are doing a lot of scientific discoveries based on those data.', 'start': 625.114, 'duration': 6.722}, {'end': 633.998, 'text': 'It is not just predetermined reports.', 'start': 632.196, 'duration': 1.802}, {'end': 636.681, 'text': 'We are doing a lot of ad hoc queries also.', 'start': 634.418, 'duration': 2.263}, {'end': 641.346, 'text': 'So we are able to actually forecast what is my sales going to look like.', 'start': 637.161, 'duration': 4.185}], 'summary': 'Big data requires distributed storage, data visualization, and predictive analytics for scientific discoveries and forecasting sales.', 'duration': 59.042, 'max_score': 582.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA582304.jpg'}, {'end': 854.009, 'src': 'embed', 'start': 831.053, 'weight': 9, 'content': [{'end': 842.979, 'text': 'So it was considered that it would be beneficial if they keep some beer racks next to diaper bins so that people can easily pick them up and they enhance the sales of their beer plus diaper combinations.', 'start': 831.053, 'duration': 11.926}, {'end': 849.265, 'text': 'similarly, they can be used to predict high LTV customers and help in customer segmentation.', 'start': 844.281, 'duration': 4.984}, {'end': 850.846, 'text': 'so we can do segmentation.', 'start': 849.265, 'duration': 1.581}, {'end': 854.009, 'text': 'so LTV is nothing but loan to value customers.', 'start': 850.846, 'duration': 3.163}], 'summary': 'Placing beer racks next to diaper bins can boost sales and customer segmentation based on ltv.', 'duration': 22.956, 'max_score': 831.053, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA831053.jpg'}], 'start': 373.858, 'title': 'The need for data science and its impact on sales analysis', 'summary': 'Explores the necessity of data science for analyzing patterns, making predictions, and drawing insights, as well as its evolution from structured databases to data warehouses. it also delves into the evolution of data analysis, including handling structured and unstructured data, big data storage, algorithms, ad hoc queries, and its impact on the retail domain. additionally, it discusses insights from sales data analysis, such as the correlation between sales of baby diapers and beers, leading to strategic product placement and machine intelligence development.', 'chapters': [{'end': 447.357, 'start': 373.858, 'title': 'The need for data science', 'summary': 'Discusses the need for data science in analyzing patterns, making predictions, and drawing insights from data, as well as the evolution from structured databases to data warehouses.', 'duration': 73.499, 'highlights': ['The need for data science arises from analyzing hidden patterns, making predictions, and drawing insights from data.', 'The evolution from structured databases to data warehouses occurred as the data grew bigger, leading to the storage of historical and transactional data separately.', 'Sales increase during weather changes and seasonal shifts, leading to a lot of sales during those times.']}, {'end': 731.641, 'start': 447.357, 'title': 'Evolution of data analysis', 'summary': 'Discusses the evolution of data analysis from traditional bi to handling structured and unstructured data, big data storage with hadoop, data science algorithms, ad hoc queries, and its impact on retail domain with examples of product placement in stores and a case study from walmart.', 'duration': 284.284, 'highlights': ['Data science algorithms enable ad hoc queries and scientific discoveries based on data, allowing businesses to forecast sales and customer behavior.', 'The shift to storing unstructured data alongside structured data has led to the need for big data storage solutions like Hadoop, due to the massive increase in data volume.', 'The evolution of data analysis from traditional BI to data science involves a focus on data visualization, storytelling through data, and scientific discoveries.', 'The impact of data analysis is evident in the retail domain, where data science is used to optimize product placement and enhance business.', "Example of Walmart's sales analysis on diapers demonstrates the practical application of data science in retail, showcasing the power of data-driven insights for business decisions."]}, {'end': 952.383, 'start': 732.182, 'title': 'Sales data analysis insights', 'summary': 'Discusses how sales data analysis revealed a correlation between the sales of baby diapers and beers on friday afternoons, leading to strategic product placement, customer segmentation, and machine intelligence development.', 'duration': 220.201, 'highlights': ['Sales data analysis revealed a correlation between the sales of baby diapers and beers, particularly on Friday afternoons, leading to strategic product placement.', 'Data science mechanisms enabled the identification of high LTV customers and facilitated customer segmentation for banks and financial institutions.', 'Development of intelligence in machines through data collection and analysis, leading to advanced features in cars and predictive capabilities.']}], 'duration': 578.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA373858.jpg', 'highlights': ['The need for data science arises from analyzing hidden patterns, making predictions, and drawing insights from data.', 'The evolution from structured databases to data warehouses occurred as the data grew bigger, leading to the storage of historical and transactional data separately.', 'The shift to storing unstructured data alongside structured data has led to the need for big data storage solutions like Hadoop, due to the massive increase in data volume.', 'The impact of data analysis is evident in the retail domain, where data science is used to optimize product placement and enhance business.', 'Sales data analysis revealed a correlation between the sales of baby diapers and beers, particularly on Friday afternoons, leading to strategic product placement.', 'Data science mechanisms enabled the identification of high LTV customers and facilitated customer segmentation for banks and financial institutions.', 'Development of intelligence in machines through data collection and analysis, leading to advanced features in cars and predictive capabilities.', 'Sales increase during weather changes and seasonal shifts, leading to a lot of sales during those times.', 'Data science algorithms enable ad hoc queries and scientific discoveries based on data, allowing businesses to forecast sales and customer behavior.', 'The evolution of data analysis from traditional BI to data science involves a focus on data visualization, storytelling through data, and scientific discoveries.', "Example of Walmart's sales analysis on diapers demonstrates the practical application of data science in retail, showcasing the power of data-driven insights for business decisions."]}, {'end': 1339.123, 'segs': [{'end': 1026.443, 'src': 'embed', 'start': 994.808, 'weight': 4, 'content': [{'end': 1000.352, 'text': 'algorithms and machine learning principles with the goal to discover hidden patterns from the raw data.', 'start': 994.808, 'duration': 5.544}, {'end': 1006.4, 'text': 'So we have data coming from multiple sources, a lot of information being collected as I just mentioned.', 'start': 1000.953, 'duration': 5.447}, {'end': 1009.023, 'text': 'But then what is the use of that data?', 'start': 1006.72, 'duration': 2.303}, {'end': 1010.264, 'text': 'How do we use that data?', 'start': 1009.083, 'duration': 1.181}, {'end': 1012.256, 'text': 'to use that data?', 'start': 1011.055, 'duration': 1.201}, {'end': 1016.738, 'text': 'there are some statistical algorithms that we use to figure out that.', 'start': 1012.256, 'duration': 4.482}, {'end': 1018.839, 'text': 'what is the story that the data is telling?', 'start': 1016.738, 'duration': 2.101}, {'end': 1026.443, 'text': 'we use various tools that are prevalent, that help us in identifying those patterns in data sciences, in the data,', 'start': 1018.839, 'duration': 7.604}], 'summary': 'Utilizing algorithms and machine learning to extract patterns from diverse data sources.', 'duration': 31.635, 'max_score': 994.808, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA994808.jpg'}, {'end': 1145.19, 'src': 'embed', 'start': 1110.871, 'weight': 0, 'content': [{'end': 1117.236, 'text': 'there are lot of parameters that are used, like average pass time of the ball, number of successful passes,', 'start': 1110.871, 'duration': 6.365}, {'end': 1121.64, 'text': 'speed and accuracy of the successful baskets and area of the court.', 'start': 1117.236, 'duration': 4.404}, {'end': 1126.063, 'text': 'it is art of sports has now transformed into science, of data science.', 'start': 1121.64, 'duration': 4.423}, {'end': 1137.367, 'text': 'So there are a lot of data science being used and there is a person who is constantly looking at his computer and drawing all of these insights and sending signals to people playing in the court as well.', 'start': 1127.004, 'duration': 10.363}, {'end': 1140.788, 'text': 'In e-commerce data science is being widely used.', 'start': 1138.148, 'duration': 2.64}, {'end': 1145.19, 'text': 'So Amazon for example has a huge number of consumer purchasing data.', 'start': 1141.049, 'duration': 4.141}], 'summary': 'Data science is transforming sports, with parameters like average pass time, successful passes, basket speed and accuracy being used. e-commerce, such as amazon, also heavily utilizes data science for consumer purchasing data.', 'duration': 34.319, 'max_score': 1110.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1110871.jpg'}, {'end': 1262.408, 'src': 'embed', 'start': 1237.155, 'weight': 3, 'content': [{'end': 1243.559, 'text': 'Google car, Google is now developing a self-driving car and this is something that people have been waiting for long time.', 'start': 1237.155, 'duration': 6.404}, {'end': 1248.101, 'text': 'So Google is developing driverless car which would be very smart.', 'start': 1244.199, 'duration': 3.902}, {'end': 1254.184, 'text': 'So how it would work is that it would collect data in real time through the environment surroundings,', 'start': 1248.581, 'duration': 5.603}, {'end': 1262.408, 'text': 'based on several sensors that would be located in the car, and these sensors would collect information like temperature, weather data,', 'start': 1254.184, 'duration': 8.224}], 'summary': 'Google is developing a self-driving car with sensors to collect real-time environmental data.', 'duration': 25.253, 'max_score': 1237.155, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1237155.jpg'}], 'start': 952.383, 'title': 'Data science applications', 'summary': 'Explores data science applications in fraud detection, sentiment analysis, sports analytics, and e-commerce, including its use in tracking team strategies and player performance in basketball, as well as in personalized recommendations and the development of self-driving cars in e-commerce.', 'chapters': [{'end': 1074.759, 'start': 952.383, 'title': 'Data science: tools, algorithms, and applications', 'summary': 'Discusses the relevance and application of data science, emphasizing its role in fraud detection, sentiment analysis for disaster response, and sports analytics, showcasing the use of statistical algorithms and machine learning principles to make predictions and aid decision-making.', 'duration': 122.376, 'highlights': ['Data science is used for fraud detection, enabling real-time measures to avoid fraudulent transactions.', 'Sentiment analysis was utilized for flood response in Chennai, identifying areas in need of help and areas where aid was not reaching.', 'Data science involves the use of statistical algorithms and machine learning principles to discover hidden patterns and make predictions from raw data.', 'Data science is widely used in sports analytics, providing insights such as the impact of specific player performance on match outcomes.']}, {'end': 1339.123, 'start': 1075.359, 'title': 'Data science in sports and e-commerce', 'summary': "Discusses the use of data science in basketball for tracking team strategies and player performance, as well as its application in e-commerce, including amazon's use of consumer purchasing data and personalized recommendations, and google's development of self-driving cars based on real-time environmental data.", 'duration': 263.764, 'highlights': ['Amazon has trillions of rows of consumer purchasing data and segments customer profiles to make personalized recommendations, resulting in more efficient stock delivery and decision making.', 'Google is developing a self-driving car that collects real-time environmental data through sensors to make decisions about speed, overtaking, and turning, showcasing the application of data science in autonomous driving.', "Basketball teams use data for tracking team strategies, player performance, and the correlation between the height of players and the team's chances of winning."]}], 'duration': 386.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA952383.jpg', 'highlights': ['Amazon segments customer profiles for personalized recommendations, improving stock delivery.', "Google's self-driving car uses real-time environmental data for autonomous decision-making.", 'Data science enables real-time fraud detection measures to avoid fraudulent transactions.', 'Data science is used in sports analytics to provide insights into player performance and match outcomes.', 'Sentiment analysis was utilized for flood response in Chennai, identifying areas in need of help.']}, {'end': 2144.867, 'segs': [{'end': 1600.555, 'src': 'embed', 'start': 1577.492, 'weight': 6, 'content': [{'end': 1584.48, 'text': 'and that is done on basis of several different parameters, like the source, the destination to which they are flying,', 'start': 1577.492, 'duration': 6.988}, {'end': 1591.769, 'text': 'to the weather conditions in those areas, whether they are geographically suitable for flight travel or there is lot of turbulence,', 'start': 1584.48, 'duration': 7.289}, {'end': 1600.555, 'text': 'then the average wait time that is there, the number of planes that airlines have, and a lot of different parameters come into actually deducing that.', 'start': 1591.769, 'duration': 8.786}], 'summary': 'Flight schedules are determined by various parameters such as source, destination, weather, geographical suitability, turbulence, average wait time, and number of planes.', 'duration': 23.063, 'max_score': 1577.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1577492.jpg'}, {'end': 1687.006, 'src': 'embed', 'start': 1660.377, 'weight': 4, 'content': [{'end': 1665.019, 'text': 'we are able to actually predict whether this medicine would be effective or not in treating people.', 'start': 1660.377, 'duration': 4.642}, {'end': 1669.91, 'text': 'It is used in sales for offering discounts and for demand forecasting.', 'start': 1665.845, 'duration': 4.065}, {'end': 1672.573, 'text': 'So, looking at the trends of sales,', 'start': 1670.33, 'duration': 2.243}, {'end': 1680.522, 'text': 'that over the last few months or few years we can actually accurately forecast that what would be the demand for certain products.', 'start': 1672.573, 'duration': 7.949}, {'end': 1687.006, 'text': 'and for this we not only look at the historical data, we look at seasonal data as well.', 'start': 1681.162, 'duration': 5.844}], 'summary': 'Predict effectiveness of medicine and forecast demand based on sales trends and historical/seasonal data.', 'duration': 26.629, 'max_score': 1660.377, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1660377.jpg'}, {'end': 1734.103, 'src': 'embed', 'start': 1703.975, 'weight': 5, 'content': [{'end': 1709.718, 'text': 'and what were the circumstances at that point of time that helped us in achieving that particular sale,', 'start': 1703.975, 'duration': 5.743}, {'end': 1716.1, 'text': 'and what are the decisions that we should take now to increase that sale or to take it to a different level.', 'start': 1709.718, 'duration': 6.382}, {'end': 1719.441, 'text': 'so in credit insurance it is very widely used.', 'start': 1716.1, 'duration': 3.341}, {'end': 1727.303, 'text': 'we see that who are likely to actually return our loans or time and also look at people who are likely to default.', 'start': 1719.441, 'duration': 7.862}, {'end': 1734.103, 'text': 'In one project that we did for banks, we were actually able to predict the number of dead accounts.', 'start': 1728.001, 'duration': 6.102}], 'summary': 'Credit insurance widely used, predicted dead accounts for banks.', 'duration': 30.128, 'max_score': 1703.975, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1703975.jpg'}, {'end': 1997.931, 'src': 'embed', 'start': 1934.035, 'weight': 0, 'content': [{'end': 1938.777, 'text': 'and then there is machine learning, where the machine actually learns based on data and it improves itself.', 'start': 1934.035, 'duration': 4.742}, {'end': 1941.659, 'text': 'So, we improve the algorithms an example would be.', 'start': 1938.817, 'duration': 2.842}, {'end': 1942.379, 'text': 'So let us say,', 'start': 1941.739, 'duration': 0.64}, {'end': 1952.027, 'text': 'if you are a store owner and there are 3 customers that come to your store and they are all children and they come to your store and they buy toys,', 'start': 1942.379, 'duration': 9.648}, {'end': 1958.669, 'text': 'so what impression that you will draw is that since most of my customers are children, I should stock more children stuff.', 'start': 1952.027, 'duration': 6.642}, {'end': 1961.35, 'text': "but as you get more people so let's say,", 'start': 1958.669, 'duration': 2.681}, {'end': 1968.133, 'text': 'if there are two adults also entering your store and they demand for some grocery then your decision would change.', 'start': 1961.35, 'duration': 6.783}, {'end': 1973.756, 'text': 'Now you have gained an additional learning that my store is not only for children.', 'start': 1968.593, 'duration': 5.163}, {'end': 1976.618, 'text': 'I also need to focus on some adult stuff.', 'start': 1973.756, 'duration': 2.862}, {'end': 1979.8, 'text': 'but since 3 children out of 5 were,', 'start': 1976.618, 'duration': 3.182}, {'end': 1987.485, 'text': '3 of my 5 customers were children it makes more sense to have more children related stock and less of the adult stock.', 'start': 1979.8, 'duration': 7.685}, {'end': 1989.206, 'text': 'but as your data grows,', 'start': 1987.485, 'duration': 1.721}, {'end': 1997.931, 'text': 'then your learning changes and this is where machine learning and computer science comes into practice and you are able to recalibrate your models.', 'start': 1989.206, 'duration': 8.725}], 'summary': 'Machine learning adapts to customer data, adjusting stock and models as customer base grows.', 'duration': 63.896, 'max_score': 1934.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1934035.jpg'}], 'start': 1339.123, 'title': 'Retail chain data analysis', 'summary': 'Discusses a data science analysis of a retail chain, identifying low-performing stores due to high-profile locations and excessive free gifts, leading to recommendations for closure or product adjustment. it also covers various data science use cases, including predicting tiger deaths, flight delay forecasts, medication effectiveness, and dead account reactivation, while outlining essential data science skills such as mathematics, computer science, and domain expertise.', 'chapters': [{'end': 1396.194, 'start': 1339.123, 'title': 'Retail chain data analysis', 'summary': 'Discusses a data science analysis of a retail chain, identifying low-performing stores due to high-profile locations and excessive free gifts, leading to recommendations for closure or product adjustment.', 'duration': 57.071, 'highlights': ['The data science analysis revealed that some stores in the retail chain were underperforming due to high-profile locations and excessive free gifts, leading to recommendations for closure or product adjustment.', 'Millions of items were being sold for 1 rupee to 100 rupee price, including expensive items like televisions, impacting the business performance of the retail store.']}, {'end': 2144.867, 'start': 1396.194, 'title': 'Data science use cases and skills', 'summary': 'Discusses various use cases of data science, including predicting tiger deaths in a national park, flight delay forecasts, medication effectiveness in healthcare, and dead account reactivation in banks. it also outlines the essential skills for a data scientist, emphasizing mathematics, computer science, and domain expertise.', 'duration': 748.673, 'highlights': ['Predicting tiger deaths in a national park during rainy seasons, leading to the recommendation of evacuating tigers to safer regions.', 'Forecasting flight delays based on historical and real-time data, considering parameters like source, destination, weather conditions, and geographical suitability for flight travel.', 'Identifying dead accounts in banks and predicting the subset of customers likely to reactivate their accounts based on transaction history and offering incentives, particularly targeting women as more feasible customers for reactivating their accounts.', 'Utilizing data science in healthcare for disease predictions and medication effectiveness, including conducting tests to predict the likelihood of certain diseases and determining the effectiveness of medications through population samples and analysis.', 'Employing data science in sales for offering discounts and demand forecasting, considering historical and seasonal data to accurately forecast product demand and optimize sales strategies.']}], 'duration': 805.744, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA1339123.jpg', 'highlights': ['Millions of items were being sold for 1 rupee to 100 rupee price, impacting the business performance of the retail store.', 'The data science analysis revealed that some stores in the retail chain were underperforming due to high-profile locations and excessive free gifts, leading to recommendations for closure or product adjustment.', 'Forecasting flight delays based on historical and real-time data, considering parameters like source, destination, weather conditions, and geographical suitability for flight travel.', 'Utilizing data science in healthcare for disease predictions and medication effectiveness, including conducting tests to predict the likelihood of certain diseases and determining the effectiveness of medications through population samples and analysis.', 'Identifying dead accounts in banks and predicting the subset of customers likely to reactivate their accounts based on transaction history and offering incentives, particularly targeting women as more feasible customers for reactivating their accounts.', 'Employing data science in sales for offering discounts and demand forecasting, considering historical and seasonal data to accurately forecast product demand and optimize sales strategies.', 'Predicting tiger deaths in a national park during rainy seasons, leading to the recommendation of evacuating tigers to safer regions.']}, {'end': 2556.627, 'segs': [{'end': 2188.859, 'src': 'embed', 'start': 2163.261, 'weight': 1, 'content': [{'end': 2169.766, 'text': "you can find out that the correlation between Virat Kohli's betting and his captaincy is 7.3.", 'start': 2163.261, 'duration': 6.505}, {'end': 2174.41, 'text': 'but then it does not mean anything for sport audience or for a cricket player.', 'start': 2169.766, 'duration': 4.644}, {'end': 2178.372, 'text': 'so now you should be able to explain what that means.', 'start': 2174.93, 'duration': 3.442}, {'end': 2183.436, 'text': 'so the mathematical deduction that you have obtained, what does it mean in terms of business?', 'start': 2178.372, 'duration': 5.064}, {'end': 2185.397, 'text': 'so you might be able to find out.', 'start': 2183.436, 'duration': 1.961}, {'end': 2188.859, 'text': "in case of, let's say, telecom, you should be understanding that.", 'start': 2185.397, 'duration': 3.462}], 'summary': "Correlation between virat kohli's betting and captaincy is 7.3, but its relevance to sports or cricket is unclear. business implications need to be explored, such as in telecom.", 'duration': 25.598, 'max_score': 2163.261, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2163261.jpg'}, {'end': 2364.188, 'src': 'embed', 'start': 2325.533, 'weight': 0, 'content': [{'end': 2331.995, 'text': 'their work profiles, their purchases, history, their delivery addresses, their transactions, their interest.', 'start': 2325.533, 'duration': 6.462}, {'end': 2335.076, 'text': 'So a lot of information is there only about customer.', 'start': 2332.435, 'duration': 2.641}, {'end': 2343.638, 'text': 'Now which of these parameters or which of these values are actually important and which are the values that are not so important that have to be taken care.', 'start': 2335.316, 'duration': 8.322}, {'end': 2348.899, 'text': 'So Amazon might not be interested in what is your salary for example even if it has that data.', 'start': 2343.718, 'duration': 5.181}, {'end': 2357.001, 'text': 'To predict whether you are going to buy a product or not it might not be interested in your income even if it is able to obtain it from some sources.', 'start': 2349.179, 'duration': 7.822}, {'end': 2362.546, 'text': 'what Amazon would be interested in knowing what are your interest and what is your purse size?', 'start': 2357.561, 'duration': 4.985}, {'end': 2364.188, 'text': 'what is your purchase capacity?', 'start': 2362.546, 'duration': 1.642}], 'summary': 'Amazon focuses on customer interests, purchase capacity, not income or salary data.', 'duration': 38.655, 'max_score': 2325.533, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2325533.jpg'}, {'end': 2476.992, 'src': 'embed', 'start': 2447.144, 'weight': 3, 'content': [{'end': 2453.867, 'text': 'so this comes when you actually look for sources that are beyond the information that is already provided to you.', 'start': 2447.144, 'duration': 6.723}, {'end': 2461.179, 'text': 'A lot of time of a data scientist actually goes into processing and cleaning data and verifying the integrity of the data.', 'start': 2454.733, 'duration': 6.446}, {'end': 2466.603, 'text': 'So it is very important to ascertain whether data is provided to a data scientist is correct or not.', 'start': 2461.339, 'duration': 5.264}, {'end': 2470.026, 'text': 'For example there can be a lot of outliers in the data.', 'start': 2466.803, 'duration': 3.223}, {'end': 2476.992, 'text': "So let's say if I have a data set of 100 items and all of these 100 items have values between 1 and 1000,", 'start': 2470.246, 'duration': 6.746}], 'summary': 'Data scientists spend a lot of time processing and cleaning data to verify its integrity and ensure its correctness, including identifying outliers.', 'duration': 29.848, 'max_score': 2447.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2447144.jpg'}], 'start': 2145.848, 'title': 'Importance of domain expertise in data science and roles of a data scientist', 'summary': 'Highlights the significance of domain expertise in data science, emphasizing the ability to explain insights in various industries like sports and telecom. it also discusses the responsibilities of a data scientist, including data modeling, feature selection, data mining, and building predictive models for business decision-making.', 'chapters': [{'end': 2243.131, 'start': 2145.848, 'title': 'Importance of domain expertise in data science', 'summary': 'Emphasizes the significance of domain expertise in data science, highlighting the importance of being able to explain insights drawn from mathematics in the context of specific industries, such as sports and telecom, to provide actionable business value.', 'duration': 97.283, 'highlights': ["Domain expertise is crucial in data science as it enables the explanation of mathematical insights in a business context, such as correlating sports data like Virat Kohli's performance to actionable business implications.", 'Understanding churn analysis in telecom is essential, as it allows for the identification of customer behavior and its impact on business decisions.', 'Businesses seek insightful data that can be translated into meaningful stories and actionable predictions, emphasizing the need for domain knowledge to provide valuable insights.']}, {'end': 2556.627, 'start': 2243.411, 'title': 'Roles of a data scientist', 'summary': 'Discusses the responsibilities of a data scientist, including creating data models, selecting features, data mining, and building predictive models, all to derive valuable insights from large-scale data sets for business decision-making and analytics.', 'duration': 313.216, 'highlights': ['A data scientist is responsible for creating data models, selecting features, and optimizing classifiers using machine learning techniques to derive valuable insights from large-scale data sets, enabling businesses to make informed decisions and conduct analytics.', 'Data scientists are tasked with selecting the right features from large data sets, such as customer information, to build and optimize classifiers using machine learning techniques, ensuring that only relevant variables are considered for predictions, which is crucial for businesses.', "The data scientist's role involves extensive data mining to uncover valuable insights from the data, using various techniques to analyze and understand the depth of the information, aiding in predictive analytics and decision-making for businesses.", "Data scientists extend the company's data by integrating third-party sources when necessary, enabling a more comprehensive understanding of the data, such as geographical and socio-political influences, which is crucial for making informed business decisions.", "Verifying the integrity of data and cleaning it is a significant part of a data scientist's responsibilities, as ensuring the accuracy and cleanliness of the data is crucial for reliable analysis and predictive modeling, especially in the presence of outliers and dirty data values."]}], 'duration': 410.779, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2145848.jpg', 'highlights': ['Domain expertise is crucial in data science for explaining insights in a business context, such as correlating sports data to actionable business implications.', 'Understanding churn analysis in telecom is essential for identifying customer behavior and its impact on business decisions.', 'Businesses seek insightful data that can be translated into meaningful stories and actionable predictions, emphasizing the need for domain knowledge.', 'A data scientist is responsible for creating data models, selecting features, and optimizing classifiers using machine learning techniques.', 'Data scientists are tasked with selecting the right features from large data sets, such as customer information, to build and optimize classifiers using machine learning techniques.', "The data scientist's role involves extensive data mining to uncover valuable insights from the data, aiding in predictive analytics and decision-making for businesses.", "Data scientists extend the company's data by integrating third-party sources when necessary, enabling a more comprehensive understanding of the data.", "Verifying the integrity of data and cleaning it is a significant part of a data scientist's responsibilities, ensuring the accuracy and cleanliness of the data for reliable analysis and predictive modeling."]}, {'end': 2830.834, 'segs': [{'end': 2615.567, 'src': 'embed', 'start': 2592.547, 'weight': 0, 'content': [{'end': 2601.576, 'text': 'looking. then, the data sources, data sources and business intelligence were usually structured like I said, SQL, data warehouse or databases like Oracle.', 'start': 2592.547, 'duration': 9.029}, {'end': 2605.839, 'text': 'But in data science what we are looking at is not just structured data.', 'start': 2602.176, 'duration': 3.663}, {'end': 2607.34, 'text': 'we also have unstructured data.', 'start': 2605.839, 'duration': 1.501}, {'end': 2609.642, 'text': 'So data can be coming from anywhere.', 'start': 2607.741, 'duration': 1.901}, {'end': 2611.864, 'text': 'it can be logs from applications.', 'start': 2609.642, 'duration': 2.222}, {'end': 2612.905, 'text': 'it can be cloud data.', 'start': 2611.864, 'duration': 1.041}, {'end': 2614.386, 'text': 'it can be SQL NoSQL.', 'start': 2612.905, 'duration': 1.481}, {'end': 2615.567, 'text': 'it can be text data.', 'start': 2614.386, 'duration': 1.181}], 'summary': 'Data sources in data science include structured and unstructured data from various sources such as logs, cloud, sql, nosql, and text data.', 'duration': 23.02, 'max_score': 2592.547, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2592547.jpg'}, {'end': 2679.694, 'src': 'embed', 'start': 2637.122, 'weight': 2, 'content': [{'end': 2643.224, 'text': 'there is lot of data in those handwritten forms that people used to fill for manual registration earlier.', 'start': 2637.122, 'duration': 6.102}, {'end': 2648.426, 'text': 'so a lot of structured and unstructured data we data scientist have to deal with.', 'start': 2643.224, 'duration': 5.202}, {'end': 2655.988, 'text': 'Approaching business intelligence is more about statistics and visualization, whereas in data science there is statistics, machine learning,', 'start': 2648.926, 'duration': 7.062}, {'end': 2662.149, 'text': 'graph analysis, lot of new technologies that are involved in actually developing those technologies.', 'start': 2655.988, 'duration': 6.161}, {'end': 2669.651, 'text': 'Focus is past and present in business intelligence as we said looking backward and in data science it is present and future.', 'start': 2662.389, 'duration': 7.262}, {'end': 2679.694, 'text': 'Tools used for business intelligence were Pentaho, Microsoft, BI, Clickview, whereas in data science we are actually using R, Python, RapidMiner,', 'start': 2669.831, 'duration': 9.863}], 'summary': 'Data science involves more diverse technologies than business intelligence, utilizing tools like r, python, and rapidminer.', 'duration': 42.572, 'max_score': 2637.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2637122.jpg'}, {'end': 2766.172, 'src': 'heatmap', 'start': 2719.42, 'weight': 0.703, 'content': [{'end': 2725.804, 'text': 'So commonly used tools, so there are four kind of applications that we typically have in data science.', 'start': 2719.42, 'duration': 6.384}, {'end': 2728.085, 'text': 'The first one is of data analysis.', 'start': 2726.024, 'duration': 2.061}, {'end': 2733.008, 'text': 'So R and SAS are two programming languages that are widely used in data analysis.', 'start': 2728.285, 'duration': 4.723}, {'end': 2737.811, 'text': 'R is an open source, that means it is freely available and it is very extensive.', 'start': 2733.428, 'duration': 4.383}, {'end': 2744.394, 'text': 'So there are a lot of features people have built in, a lot of libraries and algorithms that are readily available in R,', 'start': 2737.871, 'duration': 6.523}, {'end': 2746.536, 'text': 'and it is a very growing community.', 'start': 2744.394, 'duration': 2.142}, {'end': 2748.657, 'text': 'There is always some development happening in R.', 'start': 2746.776, 'duration': 1.881}, {'end': 2757.464, 'text': 'Whereas SAS is a licensed product and it is used more commercially, and it is used by big organizations like big banks,', 'start': 2748.857, 'duration': 8.607}, {'end': 2759.766, 'text': 'American Express and Bank of America.', 'start': 2757.464, 'duration': 2.302}, {'end': 2761.628, 'text': 'So they are primarily using SAS.', 'start': 2759.926, 'duration': 1.702}, {'end': 2766.172, 'text': 'Python is picking up very rapidly and it is now replacing R.', 'start': 2762.088, 'duration': 4.084}], 'summary': 'R and sas are widely used in data analysis, with r being open source and python rapidly replacing r.', 'duration': 46.752, 'max_score': 2719.42, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2719420.jpg'}], 'start': 2556.627, 'title': 'Contrasting bi and data science', 'summary': 'Contrasts business intelligence and data science, emphasizing the differences in perspective, data sources, tools used, focus, and popularity of python for data analysis, while stressing the forward-looking aspect and diverse, unstructured data sources in data science, as well as the significance of data visualization in deriving insights from data.', 'chapters': [{'end': 2637.122, 'start': 2556.627, 'title': 'Bi vs data science', 'summary': 'Discusses the differences between business intelligence and data science, focusing on perspective, data sources, and the nature of data, emphasizing the forward-looking aspect of data science and its incorporation of diverse, unstructured data sources.', 'duration': 80.495, 'highlights': ['Data science emphasizes looking forward and making predictions, while business intelligence focuses on historical data and patterns.', 'Data science deals with diverse, unstructured data sources including logs, cloud data, SQL, NoSQL, text data, and social media feeds.', 'Business intelligence primarily utilizes structured data from sources like SQL, data warehouses, and Oracle databases.']}, {'end': 2830.834, 'start': 2637.122, 'title': 'Data science vs business intelligence', 'summary': 'Discusses the differences between data science and business intelligence, highlighting the tools used, the focus on predictions versus reporting, and the growing popularity of python for data analysis. it also emphasizes the importance of data visualization in deriving insights from data.', 'duration': 193.712, 'highlights': ['R and SAS are widely used in data analysis, with R being open source and freely available, while SAS is a licensed product used by big organizations like big banks, American Express and Bank of America.', 'Python is rapidly replacing R as a widely used programming language for data analysis due to its ease of use and flexibility, with a growing community and constant development.', 'Hadoop is widely used for storing and retrieving data in data warehousing, working in conjunction with tools like SQL and Hive.', 'Data visualization is a crucial part of data science, involving the use of various tools to generate insights from data through beautiful graphs, scatter diagrams, histograms, and different kinds of plots.']}], 'duration': 274.207, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2556627.jpg', 'highlights': ['Python is rapidly replacing R as a widely used programming language for data analysis due to its ease of use and flexibility, with a growing community and constant development.', 'Data science emphasizes looking forward and making predictions, while business intelligence focuses on historical data and patterns.', 'Data science deals with diverse, unstructured data sources including logs, cloud data, SQL, NoSQL, text data, and social media feeds.', 'Data visualization is a crucial part of data science, involving the use of various tools to generate insights from data through beautiful graphs, scatter diagrams, histograms, and different kinds of plots.', 'R and SAS are widely used in data analysis, with R being open source and freely available, while SAS is a licensed product used by big organizations like big banks, American Express and Bank of America.', 'Hadoop is widely used for storing and retrieving data in data warehousing, working in conjunction with tools like SQL and Hive.', 'Business intelligence primarily utilizes structured data from sources like SQL, data warehouses, and Oracle databases.']}, {'end': 3182.233, 'segs': [{'end': 2912.447, 'src': 'embed', 'start': 2872.878, 'weight': 0, 'content': [{'end': 2875.598, 'text': 'Now we move on to the life cycle of data science.', 'start': 2872.878, 'duration': 2.72}, {'end': 2882.361, 'text': 'we will actually look at how a data science project is executed and what are the steps in solving a data science problem.', 'start': 2875.598, 'duration': 6.763}, {'end': 2890.483, 'text': 'So this is an interesting example what if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it.', 'start': 2883.101, 'duration': 7.382}, {'end': 2895.969, 'text': 'and this gentleman saying that definitely, let me take you through the steps to predict the vulnerable patients.', 'start': 2890.763, 'duration': 5.206}, {'end': 2897.23, 'text': 'this is the problem statement.', 'start': 2895.969, 'duration': 1.261}, {'end': 2912.447, 'text': 'what we are trying to do is that we are going to look at different test results and we are going to predict whether that particular person with different test results or different information about his blood profile is likely to have diabetes and if anything can be done about it.', 'start': 2897.23, 'duration': 15.217}], 'summary': 'Data science life cycle: predicting diabetes occurrence and preventive measures.', 'duration': 39.569, 'max_score': 2872.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2872878.jpg'}, {'end': 3145.437, 'src': 'heatmap', 'start': 2913.581, 'weight': 0.931, 'content': [{'end': 2918.622, 'text': "So I hope the problem statement is clear and it's very simple problem statement,", 'start': 2913.581, 'duration': 5.041}, {'end': 2922.623, 'text': "and we'll move on to what are the different steps in solving this problem?", 'start': 2918.622, 'duration': 4.001}, {'end': 2929.964, 'text': 'So the first step in solving this problem is discovery and data preparation these are the two steps that actually go hand in hand.', 'start': 2923.323, 'duration': 6.641}, {'end': 2937.185, 'text': 'So what we do in discovery is that we actually look at data and we understand that what is the kind of data that we have at hand.', 'start': 2930.284, 'duration': 6.901}, {'end': 2940.686, 'text': 'How many values of this data set are numeric?', 'start': 2937.805, 'duration': 2.881}, {'end': 2942.926, 'text': 'how many values are there that are string?', 'start': 2940.686, 'duration': 2.24}, {'end': 2947.768, 'text': 'How many values are there which are actually logical, yes or no, true or false?', 'start': 2943.246, 'duration': 4.522}, {'end': 2950.909, 'text': 'So we look at data and we do that kind of thing.', 'start': 2948.288, 'duration': 2.621}, {'end': 2953.47, 'text': 'We also do lot of data preparation,', 'start': 2951.009, 'duration': 2.461}, {'end': 2961.173, 'text': 'in which we see that whether there are any null values or empty values in data that need to be removed or replaced with some default values.', 'start': 2953.47, 'duration': 7.703}, {'end': 2965.935, 'text': 'What are the some of the outliers that do not look correct, they should be removed.', 'start': 2961.733, 'duration': 4.202}, {'end': 2971.056, 'text': 'If there is the data information that is not relevant, we also do a lot of feature selection.', 'start': 2966.515, 'duration': 4.541}, {'end': 2976.745, 'text': "So let's say, if we have 100 variables or 100 types of information stored in that data,", 'start': 2971.136, 'duration': 5.609}, {'end': 2980.932, 'text': 'then which of that information is relevant for our business or use case?', 'start': 2976.745, 'duration': 4.187}, {'end': 2983.537, 'text': 'which information should be removed from that data?', 'start': 2980.932, 'duration': 2.605}, {'end': 2992.538, 'text': 'Then we actually move on to model planning, where we identify that what is the kind of algorithm that we need to apply to this data?', 'start': 2984.249, 'duration': 8.289}, {'end': 2994.64, 'text': 'and we actually generate a model.', 'start': 2992.538, 'duration': 2.102}, {'end': 2998.104, 'text': 'we generate an equation that explains most of my data.', 'start': 2994.64, 'duration': 3.464}, {'end': 3001.647, 'text': 'in case of predictions, if I am making any prediction,', 'start': 2998.104, 'duration': 3.543}, {'end': 3008.153, 'text': 'then we take two or three variables or a number of variables that return me a value that can be predicted.', 'start': 3001.647, 'duration': 6.506}, {'end': 3016.199, 'text': 'So, for example, I can take time of visit in the store, the past history of a customer as another variable.', 'start': 3008.293, 'duration': 7.906}, {'end': 3020.702, 'text': 'and third is whether I am accompanied by my family or not whenever I make purchases.', 'start': 3016.199, 'duration': 4.503}, {'end': 3024.844, 'text': 'to determine whether I am likely to buy a shirt or not.', 'start': 3021.182, 'duration': 3.662}, {'end': 3028.786, 'text': 'so this kind of model we develop in model planning phase.', 'start': 3024.844, 'duration': 3.942}, {'end': 3030.327, 'text': 'then we do a model building.', 'start': 3028.786, 'duration': 1.541}, {'end': 3032.549, 'text': 'so we look at some assumptions.', 'start': 3030.327, 'duration': 2.222}, {'end': 3038.372, 'text': 'we validate our assumptions and we test it against the data and see whether our model is valid or not.', 'start': 3032.549, 'duration': 5.823}, {'end': 3040.793, 'text': 'and we check it against the test data.', 'start': 3038.372, 'duration': 2.421}, {'end': 3043.355, 'text': 'so we have a sample data with us.', 'start': 3040.793, 'duration': 2.562}, {'end': 3044.876, 'text': 'we have the historical data.', 'start': 3043.355, 'duration': 1.521}, {'end': 3051.861, 'text': 'we actually take subset of that data for our model planning purpose and for model building and validation.', 'start': 3044.876, 'duration': 6.985}, {'end': 3056.307, 'text': 'we actually take another subset of data and test our hypothesis on that data.', 'start': 3051.861, 'duration': 4.446}, {'end': 3057.909, 'text': 'This is called model building.', 'start': 3056.788, 'duration': 1.121}, {'end': 3065.139, 'text': 'Finally, we operationalize that data, we move it into production and my real data is created.', 'start': 3058.49, 'duration': 6.649}, {'end': 3071.741, 'text': 'We actually run our algorithm on that data and we communicate the results to our end users.', 'start': 3065.579, 'duration': 6.162}, {'end': 3075.762, 'text': 'what are the insights, or what are the stories that we have found out from that data?', 'start': 3071.741, 'duration': 4.021}, {'end': 3078.682, 'text': 'what are the predictions that we are going to make for that data?', 'start': 3075.762, 'duration': 2.92}, {'end': 3084.304, 'text': 'what is the forecasting that we can make based on the data science that we have applied on the data?', 'start': 3078.682, 'duration': 5.622}, {'end': 3089.971, 'text': 'So, these are the steps that are there in life cycle of any data science project.', 'start': 3084.824, 'duration': 5.147}, {'end': 3095.918, 'text': 'So, if you pick up data science project from any domain it would go through these steps typically.', 'start': 3090.391, 'duration': 5.527}, {'end': 3100.693, 'text': 'So now we are actually going to detail of all of these phases.', 'start': 3097.15, 'duration': 3.543}, {'end': 3106.357, 'text': 'I will cover it up so that you understand what happens behind the scenes in discovery.', 'start': 3100.913, 'duration': 5.444}, {'end': 3109.56, 'text': 'So in discovery we acquire data from different sources.', 'start': 3106.437, 'duration': 3.123}, {'end': 3114.403, 'text': 'So data can be coming from logs of web servers, it can be social media data,', 'start': 3110.04, 'duration': 4.363}, {'end': 3119.027, 'text': 'census data related to population or data streamed from other online sources.', 'start': 3114.403, 'duration': 4.624}, {'end': 3123.951, 'text': 'So we combine all of this data from different sources and then we move to.', 'start': 3119.047, 'duration': 4.904}, {'end': 3125.292, 'text': 'we actually identify that.', 'start': 3123.951, 'duration': 1.341}, {'end': 3130.398, 'text': 'So, for this particular use case, this is the kind of data that we obtain.', 'start': 3125.692, 'duration': 4.706}, {'end': 3137.006, 'text': 'So, on your right side here you can see that the data is generally available in CSV format and we have these attributes.', 'start': 3130.418, 'duration': 6.588}, {'end': 3140.631, 'text': 'So, we define these kind of attributes what is the meaning of these attributes.', 'start': 3137.046, 'duration': 3.585}, {'end': 3145.437, 'text': 'So NPreg, for example, means number of times a woman has been pregnant.', 'start': 3140.931, 'duration': 4.506}], 'summary': 'Data science project lifecycle involves data discovery, preparation, model planning, building, and operationalization.', 'duration': 231.856, 'max_score': 2913.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2913581.jpg'}, {'end': 2992.538, 'src': 'embed', 'start': 2961.733, 'weight': 3, 'content': [{'end': 2965.935, 'text': 'What are the some of the outliers that do not look correct, they should be removed.', 'start': 2961.733, 'duration': 4.202}, {'end': 2971.056, 'text': 'If there is the data information that is not relevant, we also do a lot of feature selection.', 'start': 2966.515, 'duration': 4.541}, {'end': 2976.745, 'text': "So let's say, if we have 100 variables or 100 types of information stored in that data,", 'start': 2971.136, 'duration': 5.609}, {'end': 2980.932, 'text': 'then which of that information is relevant for our business or use case?', 'start': 2976.745, 'duration': 4.187}, {'end': 2983.537, 'text': 'which information should be removed from that data?', 'start': 2980.932, 'duration': 2.605}, {'end': 2992.538, 'text': 'Then we actually move on to model planning, where we identify that what is the kind of algorithm that we need to apply to this data?', 'start': 2984.249, 'duration': 8.289}], 'summary': 'Identifying outliers, removing irrelevant data, and selecting relevant features are essential for model planning and algorithm selection.', 'duration': 30.805, 'max_score': 2961.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2961733.jpg'}, {'end': 3174.426, 'src': 'embed', 'start': 3145.437, 'weight': 4, 'content': [{'end': 3149.382, 'text': 'glucose level indicates the plasma glucose concentration.', 'start': 3145.437, 'duration': 3.945}, {'end': 3151.024, 'text': 'BP denotes blood pressure.', 'start': 3149.382, 'duration': 1.642}, {'end': 3156.05, 'text': 'So we actually look at the database schema and understand what is the meaning of that data.', 'start': 3151.044, 'duration': 5.006}, {'end': 3159.354, 'text': 'what are the different kinds of parameters that are present in that data?', 'start': 3156.05, 'duration': 3.304}, {'end': 3166.34, 'text': 'again, there is a very interesting attribute called income over here, which has nothing to do with the problem that we are trying to solve.', 'start': 3159.674, 'duration': 6.666}, {'end': 3174.426, 'text': 'and since you are combining data from multiple sources, it is quite likely that you would be seeing some kind of mixed information in this data.', 'start': 3166.34, 'duration': 8.086}], 'summary': 'Analyzing database schema for diverse parameters and potential mixed information.', 'duration': 28.989, 'max_score': 3145.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3145437.jpg'}], 'start': 2831.696, 'title': 'Tableau and data science life cycle', 'summary': 'Discusses the widespread use of tableau in data visualization, along with other tools like parc, mahout, and azure ml studio. it also covers the data science project life cycle, including steps such as discovery, data preparation, model planning, model building, and operationalization, with a focus on predicting diabetes occurrence.', 'chapters': [{'end': 2871.937, 'start': 2831.696, 'title': 'Tableau for data visualization', 'summary': 'Discusses the widespread use of tableau in data visualization, highlighting its benefits and the relevance of other tools like parc, mahout, and azure ml studio in data analysis and machine learning.', 'duration': 40.241, 'highlights': ['Tableau is widely used in data visualization, enabling experts to create various graphs and pictorial representations using drag and drop techniques.', 'PARC, Mahout, and Azure ML studio are mentioned as relevant tools for machine learning and data analysis, with a connection to Hadoop.']}, {'end': 3182.233, 'start': 2872.878, 'title': 'Data science life cycle', 'summary': 'Discusses the life cycle of a data science project, including steps such as discovery and data preparation, model planning, model building, and operationalization, with a focus on predicting diabetes occurrence.', 'duration': 309.355, 'highlights': ['The first step in solving the problem is discovery and data preparation, which involves understanding the data types, handling null values, outliers, and selecting relevant features, and is essential for executing a data science project.', 'Model planning phase involves identifying the algorithm, generating a model, and developing equations to explain and make predictions based on the data, thereby contributing to the execution of a data science project.', 'Model building phase includes validating assumptions, testing the model against data, and operationalizing the data to generate real results and communicate insights and predictions to end users, which is a critical step in the data science life cycle.', 'The chapter delves into the acquisition of data from various sources, such as web server logs, social media, census data, and online sources, and the process of understanding the database schema and attributes, highlighting the importance of data acquisition and understanding for a data science project.']}], 'duration': 350.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA2831696.jpg', 'highlights': ['Tableau enables experts to create various graphs and pictorial representations using drag and drop techniques.', 'PARC, Mahout, and Azure ML studio are relevant tools for machine learning and data analysis.', 'Discovery and data preparation involve understanding data types, handling null values, outliers, and selecting relevant features.', 'Model building phase includes validating assumptions, testing the model against data, and operationalizing the data to generate real results.', 'The chapter delves into the acquisition of data from various sources, such as web server logs, social media, census data, and online sources.']}, {'end': 3776.814, 'segs': [{'end': 3330.92, 'src': 'heatmap', 'start': 3254.34, 'weight': 0.751, 'content': [{'end': 3256.662, 'text': 'so there are different approaches that can be taken.', 'start': 3254.34, 'duration': 2.322}, {'end': 3262.847, 'text': 'so, for example, this high value in BP might be replaced by the average value of rest of the values.', 'start': 3256.662, 'duration': 6.185}, {'end': 3266.67, 'text': 'the string value was replaced by numerical, and so on so forth.', 'start': 3262.847, 'duration': 3.823}, {'end': 3269.711, 'text': 'so then next we move on to model planning.', 'start': 3267.31, 'duration': 2.401}, {'end': 3271.611, 'text': 'so here we actually determine that.', 'start': 3269.711, 'duration': 1.9}, {'end': 3273.431, 'text': 'what are the tools that we are going to use?', 'start': 3271.611, 'duration': 1.82}, {'end': 3281.613, 'text': 'we also do some sort of exploratory data analysis where we look at what is the mean size of different parameters, what is the median?', 'start': 3273.431, 'duration': 8.182}, {'end': 3284.314, 'text': 'what is the more most frequently occurring items?', 'start': 3281.613, 'duration': 2.701}, {'end': 3285.694, 'text': 'what are the type of items?', 'start': 3284.314, 'duration': 1.38}, {'end': 3287.915, 'text': 'what does data look like prima facie?', 'start': 3285.694, 'duration': 2.221}, {'end': 3295.857, 'text': 'so we do it in exploratory data analysis and we use a lot of visualization tools, lot of graphs, to actually understand the kind of data that we have.', 'start': 3287.915, 'duration': 7.942}, {'end': 3302.221, 'text': 'These are the tools are primarily R, SAS and W we use for exploratory data analysis.', 'start': 3296.497, 'duration': 5.724}, {'end': 3307.904, 'text': 'So this is how visualization techniques show us the data that how does this look like.', 'start': 3302.681, 'duration': 5.223}, {'end': 3310.926, 'text': 'So what they have done over here is different parameters.', 'start': 3308.104, 'duration': 2.822}, {'end': 3316.53, 'text': 'they have drawn graphs for different kind of parameters for glucose level, for BP, for BMI and so on.', 'start': 3310.926, 'duration': 5.604}, {'end': 3319.472, 'text': 'next part is about model building.', 'start': 3317.41, 'duration': 2.062}, {'end': 3320.893, 'text': 'here what they do is like.', 'start': 3319.472, 'duration': 1.421}, {'end': 3323.255, 'text': 'I said we separate the data into two part.', 'start': 3320.893, 'duration': 2.362}, {'end': 3328.679, 'text': 'there is one part, which is called training data set, on which we actually do our model building.', 'start': 3323.255, 'duration': 5.424}, {'end': 3330.92, 'text': 'we create our model and we test it.', 'start': 3328.679, 'duration': 2.241}], 'summary': 'Data preprocessing involves replacing values, followed by exploratory data analysis using visualization tools like r, sas, and w. model planning includes separating data into training and testing sets for model building.', 'duration': 76.58, 'max_score': 3254.34, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3254340.jpg'}, {'end': 3281.613, 'src': 'embed', 'start': 3256.662, 'weight': 4, 'content': [{'end': 3262.847, 'text': 'so, for example, this high value in BP might be replaced by the average value of rest of the values.', 'start': 3256.662, 'duration': 6.185}, {'end': 3266.67, 'text': 'the string value was replaced by numerical, and so on so forth.', 'start': 3262.847, 'duration': 3.823}, {'end': 3269.711, 'text': 'so then next we move on to model planning.', 'start': 3267.31, 'duration': 2.401}, {'end': 3271.611, 'text': 'so here we actually determine that.', 'start': 3269.711, 'duration': 1.9}, {'end': 3273.431, 'text': 'what are the tools that we are going to use?', 'start': 3271.611, 'duration': 1.82}, {'end': 3281.613, 'text': 'we also do some sort of exploratory data analysis where we look at what is the mean size of different parameters, what is the median?', 'start': 3273.431, 'duration': 8.182}], 'summary': 'Data preprocessing involved replacing values and exploring data parameters for model planning.', 'duration': 24.951, 'max_score': 3256.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3256662.jpg'}, {'end': 3395.869, 'src': 'embed', 'start': 3369.689, 'weight': 5, 'content': [{'end': 3377.335, 'text': 'generally, most of the tools are like that bearing data visualization tools like tableau that are good for doing all of these roles.', 'start': 3369.689, 'duration': 7.646}, {'end': 3379.817, 'text': 'so this is an example of decision tree.', 'start': 3377.335, 'duration': 2.482}, {'end': 3382.879, 'text': 'so they have created a classification based on decision tree,', 'start': 3379.817, 'duration': 3.062}, {'end': 3390.124, 'text': 'and these are the green values that show that the results are okay and the red values that we do not want to consider for our result.', 'start': 3382.879, 'duration': 7.245}, {'end': 3395.869, 'text': 'so this is used for when you want to reach into a decision which is a yes, no or two false kind of decision.', 'start': 3390.124, 'duration': 5.745}], 'summary': 'Data visualization tools like tableau are used for decision tree classification, with green values indicating positive results and red values indicating negative ones.', 'duration': 26.18, 'max_score': 3369.689, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3369689.jpg'}, {'end': 3445.494, 'src': 'embed', 'start': 3416.451, 'weight': 7, 'content': [{'end': 3420.554, 'text': 'So these are the kind of insights that you can draw using these kind of models.', 'start': 3416.451, 'duration': 4.103}, {'end': 3424.197, 'text': 'Of course these models are not as simple as shown in this example.', 'start': 3420.774, 'duration': 3.423}, {'end': 3433.624, 'text': 'They can be very complex and you might have to simplify them or run your algorithm several times to reach into a stage where this model becomes this simple.', 'start': 3424.317, 'duration': 9.307}, {'end': 3439.069, 'text': 'Next is you operationalize, you derive final reports, briefings,', 'start': 3434.645, 'duration': 4.424}, {'end': 3445.494, 'text': 'and you do the documentation and you work on the real data and then you give the insights to your customers.', 'start': 3439.069, 'duration': 6.425}], 'summary': 'Data models can yield insights, but may require simplification and multiple runs for accuracy. operationalize for final reports, briefings, and customer insights.', 'duration': 29.043, 'max_score': 3416.451, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3416451.jpg'}, {'end': 3619.506, 'src': 'embed', 'start': 3595.625, 'weight': 2, 'content': [{'end': 3602.933, 'text': 'what you did till now was you took some historical data and you divided into training and test data and evaluated your model.', 'start': 3595.625, 'duration': 7.308}, {'end': 3613.641, 'text': 'now you are actually deploying your model into real time and whatever new data is being created is added to your problem data set and then again your data algorithm is improving itself.', 'start': 3602.933, 'duration': 10.708}, {'end': 3619.506, 'text': 'you again run your test and see whether, in the light of new data that is coming, is my model still holding good,', 'start': 3613.641, 'duration': 5.865}], 'summary': 'Deployed model uses new data to improve and tested for accuracy.', 'duration': 23.881, 'max_score': 3595.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3595625.jpg'}, {'end': 3736.273, 'src': 'embed', 'start': 3707.575, 'weight': 0, 'content': [{'end': 3710.277, 'text': 'what are the different tools that are used for data science?', 'start': 3707.575, 'duration': 2.702}, {'end': 3713.339, 'text': 'what are the skills that are required for being a data scientist?', 'start': 3710.277, 'duration': 3.062}, {'end': 3717.302, 'text': 'and then we looked at some example, use cases from different industries.', 'start': 3713.579, 'duration': 3.723}, {'end': 3724.889, 'text': 'and finally we took this use case, a real life problem statement, where we solved a problem for medical line or medical domain.', 'start': 3717.302, 'duration': 7.587}, {'end': 3730.411, 'text': 'so if you have any questions, please feel free to ask them, Alright.', 'start': 3724.889, 'duration': 5.522}, {'end': 3736.273, 'text': 'if no, then we can conclude this session and you can actually go through the course details.', 'start': 3730.411, 'duration': 5.862}], 'summary': 'Tools, skills, use cases, and problem-solving in data science presented with a real-life medical domain example.', 'duration': 28.698, 'max_score': 3707.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3707575.jpg'}, {'end': 3776.814, 'src': 'embed', 'start': 3755.038, 'weight': 1, 'content': [{'end': 3760.862, 'text': 'So I hope you have a good journey with data science and I also hope that you enjoyed this session.', 'start': 3755.038, 'duration': 5.824}, {'end': 3762.824, 'text': 'Thank you very much for joining in.', 'start': 3761.463, 'duration': 1.361}, {'end': 3766.727, 'text': 'I hope you enjoyed listening to this video.', 'start': 3765.085, 'duration': 1.642}, {'end': 3772.471, 'text': 'Please be kind enough to like it and you can comment any of your doubts and queries and we will reply to them at the earliest.', 'start': 3767.107, 'duration': 5.364}, {'end': 3776.354, 'text': 'do look out for more videos in our playlist and subscribe to our Edureka channel to learn more.', 'start': 3772.471, 'duration': 3.883}, {'end': 3776.814, 'text': 'happy learning.', 'start': 3776.354, 'duration': 0.46}], 'summary': 'Enjoyed data science journey? like, comment, subscribe for more learning.', 'duration': 21.776, 'max_score': 3755.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3755038.jpg'}], 'start': 3182.233, 'title': 'Data science model building and problem solving', 'summary': 'Discusses the data science model building process, emphasizing data cleaning, exploratory data analysis, model planning, and building using various tools, and the process of dividing data into training and test sets, building and evaluating models, deploying the model, and communicating results, while highlighting the importance of testing, evaluating, and deploying models for optimal performance.', 'chapters': [{'end': 3503.735, 'start': 3182.233, 'title': 'Data science model building process', 'summary': 'Discusses the data science model building process, emphasizing the importance of data cleaning, exploratory data analysis, model planning, model building using various tools, and deriving insights from the models.', 'duration': 321.502, 'highlights': ['The chapter emphasizes the importance of data cleaning to address inconsistencies and outliers in the dataset, such as replacing abnormal values and removing unnecessary columns, with examples of cleaning up from the use case provided.', 'Exploratory data analysis involves using visualization tools like R, SAS, and W to understand the data distribution and characteristics, including mean, median, and most frequently occurring items for different parameters like glucose level, BP, and BMI.', 'The model building phase includes separating the data into training and testing datasets, applying techniques like classification, clustering, regression using tools such as R, SAS, Python, and SPCS, and validating the model on the testing dataset.', 'The chapter explains the use of decision trees for classification-based models, providing examples of decision criteria and insights drawn from the models, with a mention of the complexity and simplification of models.', 'The operationalization phase involves deriving final reports, briefings, documentation, and providing insights to customers, with a discussion on choosing suitable algorithms based on the problem set and evaluating model optimality.']}, {'end': 3776.814, 'start': 3503.875, 'title': 'Data science problem solving', 'summary': 'Discusses the process of dividing data into training and test sets, building and evaluating models, deploying the model in a live environment, and communicating the results, while highlighting the importance of testing, evaluating, and deploying models using various algorithms for optimal performance.', 'duration': 272.939, 'highlights': ["The process involves dividing data into training and test sets and building models based on the training data, then testing the model's predictions on the test data.", "Mathematical tests are used to evaluate the models, with specific values indicating the model's quality, such as test results ranging from 0.05 to 0.1 signifying a good model.", 'Deploying the model into a real-time environment, continuously improving it with new data, and communicating the results and model explanation to stakeholders are crucial steps in the process.']}], 'duration': 594.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/S-T2-ir6TPA/pics/S-T2-ir6TPA3182233.jpg', 'highlights': ['The operationalization phase involves deriving final reports, briefings, documentation, and providing insights to customers, with a discussion on choosing suitable algorithms based on the problem set and evaluating model optimality.', 'The chapter emphasizes the importance of data cleaning to address inconsistencies and outliers in the dataset, such as replacing abnormal values and removing unnecessary columns, with examples of cleaning up from the use case provided.', 'The model building phase includes separating the data into training and testing datasets, applying techniques like classification, clustering, regression using tools such as R, SAS, Python, and SPCS, and validating the model on the testing dataset.', "The process involves dividing data into training and test sets and building models based on the training data, then testing the model's predictions on the test data.", 'Deploying the model into a real-time environment, continuously improving it with new data, and communicating the results and model explanation to stakeholders are crucial steps in the process.', 'Exploratory data analysis involves using visualization tools like R, SAS, and W to understand the data distribution and characteristics, including mean, median, and most frequently occurring items for different parameters like glucose level, BP, and BMI.', 'The chapter explains the use of decision trees for classification-based models, providing examples of decision criteria and insights drawn from the models, with a mention of the complexity and simplification of models.', "Mathematical tests are used to evaluate the models, with specific values indicating the model's quality, such as test results ranging from 0.05 to 0.1 signifying a good model."]}], 'highlights': ['Data science enables decision-making and predictive approaches.', 'The need for data science is driven by the rapid creation of data in both structured and unstructured forms.', 'The impact of data analysis is evident in the retail domain, where data science is used to optimize product placement and enhance business.', 'The need for data science arises from analyzing hidden patterns, making predictions, and drawing insights from data.', 'Data science mechanisms enabled the identification of high LTV customers and facilitated customer segmentation for banks and financial institutions.', 'Amazon segments customer profiles for personalized recommendations, improving stock delivery.', 'Python is rapidly replacing R as a widely used programming language for data analysis due to its ease of use and flexibility, with a growing community and constant development.', 'Tableau enables experts to create various graphs and pictorial representations using drag and drop techniques.', 'The operationalization phase involves deriving final reports, briefings, documentation, and providing insights to customers, with a discussion on choosing suitable algorithms based on the problem set and evaluating model optimality.', 'The chapter emphasizes the importance of data cleaning to address inconsistencies and outliers in the dataset, such as replacing abnormal values and removing unnecessary columns, with examples of cleaning up from the use case provided.']}