title
Intro to Data Visualization with R & ggplot2

description
In this webinar, we will provide an introduction to data visualization with the ggplot2 package. The focus of the webinar will be using ggplot2 to analyze your data visually with a specific focus on discovering the underlying signals/patterns of your business. The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While tools like Excel, Power BI, and Tableau are often the go-to solutions for data visualizations, none of these tools can compete with R in terms of the sheer breadth of, and control over, crafted data visualizations. As an example, R’s ggplot2 package provides the R programmer with dozens of print-quality visualizations – where any visualization can be heavily customized with a minimal amount of code. In this talk attendees will learn how to: • Craft ggplot visualizations, including customization of rendered output. • Choose optimal visualizations for the type of data and the nature of the analysis at hand. • Leverage ggplot2’s powerful segmentation capabilities to achieve “visual drill-in of data”. • Export ggplot2 visualizations from RStudio for use in documents and presentations. Repository: https://code.datasciencedojo.com/datasciencedojo/tutorials/tree/master/Introduction%20to%20Data%20Visualization%20with%20R%20and%20ggplot2 Table of Contents: 0:00 Introduction 6:19 Titanic dataset 14:07 ggplot2 27:35 Data analysis 32:45 Factor variables 33:10 Hypothesis data 46:26 Visualization 54:04 Age 56:34 Data visualization -- At Data Science Dojo, we believe data science is for everyone. Our data science trainings have been attended by more than 10,000 employees from over 2,500 companies globally, including many leaders in tech like Microsoft, Google, and Facebook. For more information please visit: https://hubs.la/Q01Z-13k0 💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: https://hubs.la/Q01ZZGL-0 💼 Get started in the world of data with our top-rated data science bootcamp: https://hubs.la/Q01ZZDpt0 💼 Master Python for data science, analytics, machine learning, and data engineering: https://hubs.la/Q01ZZD-s0 💼 Explore, analyze, and visualize your data with Power BI desktop: https://hubs.la/Q01ZZF8B0 -- Unleash your data science potential for FREE! Dive into our tutorials, events & courses today! 📚 Learn the essentials of data science and analytics with our data science tutorials: https://hubs.la/Q01ZZJJK0 📚 Stay ahead of the curve with the latest data science content, subscribe to our newsletter now: https://hubs.la/Q01ZZBy10 📚 Connect with other data scientists and AI professionals at our community events: https://hubs.la/Q01ZZLd80 📚 Checkout our free data science courses: https://hubs.la/Q01ZZMcm0 📚 Get your daily dose of data science with our trending blogs: https://hubs.la/Q01ZZMWl0 -- 📱 Social media links Connect with us: https://www.linkedin.com/company/data-science-dojo Follow us: https://twitter.com/DataScienceDojo Keep up with us: https://www.instagram.com/data_science_dojo/ Like us: https://www.facebook.com/datasciencedojo Find us: https://www.threads.net/@data_science_dojo -- Also, join our communities: LinkedIn: https://www.linkedin.com/groups/13601597/ Twitter: https://twitter.com/i/communities/1677363761399865344 Facebook: https://www.facebook.com/groups/AIandMachineLearningforEveryone/ Vimeo: https://vimeo.com/datasciencedojo Discord: https://discord.com/invite/tj8ken4Err _ Want to share your data science knowledge? Boost your profile and share your knowledge with our community: https://hubs.la/Q01ZZNCn0 #datavisualization #rprogramming #ggplot2

detail
{'title': 'Intro to Data Visualization with R & ggplot2', 'heatmap': [{'end': 1674.227, 'start': 1582.785, 'weight': 0.789}, {'end': 2009.561, 'start': 1922.581, 'weight': 0.94}, {'end': 2610.098, 'start': 2514.995, 'weight': 0.817}], 'summary': 'Tutorial on data visualization with r and ggplot2, led by dave langer, provides an 80% useful insight into ggplot2, using the titanic dataset to analyze survival patterns and creating quality visualizations, revealing 62% perished and 38% survived, with insights on class, gender, and age survival rates.', 'chapters': [{'end': 382.084, 'segs': [{'end': 142.977, 'src': 'embed', 'start': 81.914, 'weight': 0, 'content': [{'end': 90.117, 'text': "Dave has trained hundreds of working professionals via Data Science Dojo's unique five-day boot camp format and has trained thousands more via his YouTube tutorials.", 'start': 81.914, 'duration': 8.203}, {'end': 93.858, 'text': 'Prior to joining Data Science Dojo, Dave worked at Microsoft,', 'start': 90.777, 'duration': 3.081}, {'end': 101.281, 'text': "where he led a technical program management team accountable for all the data systems used to run Microsoft's $10 billion supply chain operations.", 'start': 93.858, 'duration': 7.423}, {'end': 106.883, 'text': "Dave joined Data Science Dojo to realize the company's mission of data science for everyone.", 'start': 102.001, 'duration': 4.882}, {'end': 114.945, 'text': "It is Dave's belief that you do not need a PhD in statistics or machine learning to learn data science and apply it to your daily work to derive business value.", 'start': 107.383, 'duration': 7.562}, {'end': 122.387, 'text': 'Dave has experience across numerous analytical technologies and techniques, but his current focus areas are text analytics,', 'start': 115.665, 'duration': 6.722}, {'end': 124.688, 'text': 'event log mining and mathematical programming.', 'start': 122.387, 'duration': 2.301}, {'end': 128.11, 'text': "Dave's true passion, however, is teaching others data science.", 'start': 125.288, 'duration': 2.822}, {'end': 133.773, 'text': 'Feel free to connect with Dave via LinkedIn, YouTube, and Twitter if he can be of help in your data science journey.', 'start': 128.57, 'duration': 5.203}, {'end': 136.555, 'text': 'Dave, welcome to this Data Science Dojo webinar.', 'start': 134.254, 'duration': 2.301}, {'end': 138.955, 'text': 'Thank you, Blair, for that introduction.', 'start': 137.554, 'duration': 1.401}, {'end': 142.977, 'text': "It's a pleasure to be here today to talk to everyone about ggplot2,", 'start': 139.735, 'duration': 3.242}], 'summary': "Dave has trained hundreds via boot camp and thousands via youtube, led data systems for microsoft's $10b supply chain operations, believes in data science for everyone, and has a passion for teaching.", 'duration': 61.063, 'max_score': 81.914, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD481914.jpg'}, {'end': 195.298, 'src': 'embed', 'start': 164.881, 'weight': 4, 'content': [{'end': 166.582, 'text': "So first up, I'm going to assume the following.", 'start': 164.881, 'duration': 1.701}, {'end': 169.663, 'text': "I'm going to assume that you are experienced with R coding.", 'start': 167.582, 'duration': 2.081}, {'end': 175.246, 'text': "I'm not going to assume that you're an expert, but I'm going to assume that you can hack, that you can understand R code when you see it.", 'start': 170.324, 'duration': 4.922}, {'end': 182.65, 'text': "Because we don't have a lot of time at our disposal, so we're going to focus mainly on how to work with ggplot2 and not so much about R syntax.", 'start': 175.886, 'duration': 6.764}, {'end': 187.073, 'text': "Next up, I'm going to assume that you have some data visualization knowledge.", 'start': 183.55, 'duration': 3.523}, {'end': 195.298, 'text': "For example, what is a histogram? Or what is a box and whisker plot? We're not going to talk about the actual visualizations themselves.", 'start': 187.573, 'duration': 7.725}], 'summary': 'Assuming basic r coding and data visualization knowledge to focus on ggplot2.', 'duration': 30.417, 'max_score': 164.881, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4164881.jpg'}, {'end': 272.093, 'src': 'embed', 'start': 248.191, 'weight': 5, 'content': [{'end': 257.738, 'text': "We're really gonna use the 80-20 rule here to talk about ggplot2 and focus on the kinds of visualizations and the kinds of questions that you want to answer using ggplot2..", 'start': 248.191, 'duration': 9.547}, {'end': 266.307, 'text': 'If you get interested in ggplot2, and I hope that you will I will provide resources later in the deck for you to learn more,', 'start': 259.261, 'duration': 7.046}, {'end': 272.093, 'text': 'more in-depth resources to get better with ggplot2, to answer all kinds of different types of business questions that you may have.', 'start': 266.307, 'duration': 5.786}], 'summary': 'Using the 80-20 rule to focus on ggplot2 visualizations and answering business questions.', 'duration': 23.902, 'max_score': 248.191, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4248191.jpg'}, {'end': 354.357, 'src': 'embed', 'start': 325.147, 'weight': 6, 'content': [{'end': 331.016, 'text': "If you wish to follow along either during this presentation or after the fact, you're going to need the following.", 'start': 325.147, 'duration': 5.869}, {'end': 333.821, 'text': "You're going to need R and you are going to need RStudio.", 'start': 331.217, 'duration': 2.604}, {'end': 337.478, 'text': 'Now, strictly speaking, RStudio is optional.', 'start': 334.915, 'duration': 2.563}, {'end': 342.323, 'text': 'You absolutely have to have the R programming language, of course, but RStudio is an optional add-on.', 'start': 337.918, 'duration': 4.405}, {'end': 344.706, 'text': 'I would highly recommend that you use RStudio.', 'start': 342.984, 'duration': 1.722}, {'end': 347.449, 'text': 'I will be doing the entire demo in RStudio,', 'start': 344.906, 'duration': 2.543}, {'end': 354.357, 'text': 'and I always recommend to folks that they use RStudio because it does make your R coding experience a lot more productive and a lot more enjoyable.', 'start': 347.449, 'duration': 6.908}], 'summary': 'R and rstudio are essential for the presentation. rstudio is recommended for a more productive and enjoyable coding experience.', 'duration': 29.21, 'max_score': 325.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4325147.jpg'}], 'start': 5.626, 'title': 'Data visualization with r and ggplot2', 'summary': 'Introduces a live tutorial on introductory data visualization with r and ggplot2, emphasizing the 80% useful aspects of ggplot2 and the prerequisites for following along with the presentation. it is led by dave langer, vice president of data science at data science dojo, who aims to make data science accessible to everyone.', 'chapters': [{'end': 163.69, 'start': 5.626, 'title': 'Intro to data visualization with r', 'summary': "Introduces data science dojo's live tutorial on introductory data visualization with r and ggplot2, led by dave langer, vice president of data science at data science dojo, focusing on ggplot2 and its application in their boot camps and his belief in making data science accessible to everyone.", 'duration': 158.064, 'highlights': ["Dave Langer, Vice President of Data Science at Data Science Dojo, has trained hundreds of working professionals via Data Science Dojo's unique five-day boot camp format and has trained thousands more via his YouTube tutorials. Dave Langer's extensive experience in training professionals through boot camps and YouTube tutorials demonstrates his expertise in the field.", "Dave Langer worked at Microsoft, where he led a technical program management team accountable for all the data systems used to run Microsoft's $10 billion supply chain operations. Dave Langer's experience in leading a team accountable for managing data systems for a $10 billion supply chain operations at Microsoft highlights his practical experience in handling large-scale data operations.", "Dave Langer's current focus areas are text analytics, event log mining, and mathematical programming. Dave Langer's focus on text analytics, event log mining, and mathematical programming showcases his expertise in these areas, which he may cover in the tutorial.", "Dave Langer's belief that you do not need a PhD in statistics or machine learning to learn data science and apply it to your daily work to derive business value. Dave Langer's emphasis on making data science accessible to everyone, regardless of their academic background, reflects the inclusive nature of the tutorial."]}, {'end': 382.084, 'start': 164.881, 'title': 'Accelerate data visualization with ggplot2', 'summary': 'Emphasizes the assumption of r coding experience, basic data visualization knowledge, and interest in accelerating data visualizations, focusing on 80% useful aspects of ggplot2. it also outlines the prerequisites for following along with the presentation and emphasizes the usefulness of rstudio.', 'duration': 217.203, 'highlights': ['The chapter emphasizes the assumption of R coding experience, basic data visualization knowledge, and interest in accelerating data visualizations. The chapter assumes the audience has R coding experience, basic data visualization knowledge, and an interest in accelerating data visualizations in day-to-day work.', 'The chapter focuses on 80% useful aspects of ggplot2 and the 20% that is useful 80% of the time. The chapter highlights the use of the 80-20 rule to focus on the 80% useful aspects of ggplot2 and the types of visualizations and questions that can be addressed using ggplot2.', 'The chapter outlines the prerequisites for following along with the presentation and emphasizes the usefulness of RStudio. The chapter details the prerequisites for following along, including the need for R and RStudio, with a recommendation for using RStudio due to its productivity and enjoyment benefits.']}], 'duration': 376.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD45626.jpg', 'highlights': ["Dave Langer's extensive experience in training professionals through boot camps and YouTube tutorials demonstrates his expertise in the field.", "Dave Langer's experience in leading a team accountable for managing data systems for a $10 billion supply chain operations at Microsoft highlights his practical experience in handling large-scale data operations.", "Dave Langer's focus on text analytics, event log mining, and mathematical programming showcases his expertise in these areas, which he may cover in the tutorial.", "Dave Langer's emphasis on making data science accessible to everyone, regardless of their academic background, reflects the inclusive nature of the tutorial.", 'The chapter assumes the audience has R coding experience, basic data visualization knowledge, and an interest in accelerating data visualizations in day-to-day work.', 'The chapter highlights the use of the 80-20 rule to focus on the 80% useful aspects of ggplot2 and the types of visualizations and questions that can be addressed using ggplot2.', 'The chapter details the prerequisites for following along, including the need for R and RStudio, with a recommendation for using RStudio due to its productivity and enjoyment benefits.']}, {'end': 700.686, 'segs': [{'end': 477.5, 'src': 'embed', 'start': 444.723, 'weight': 0, 'content': [{'end': 446.985, 'text': 'But everyone is familiar with what happened on the Titanic.', 'start': 444.723, 'duration': 2.262}, {'end': 448.186, 'text': "So it's part of the reason why we use it.", 'start': 447.025, 'duration': 1.161}, {'end': 448.987, 'text': "It's super important.", 'start': 448.266, 'duration': 0.721}, {'end': 459.746, 'text': 'Next up, believe it or not, the actual Titanic dataset that you can get from Kaggle is actually a good proxy for common business data.', 'start': 450.828, 'duration': 8.918}, {'end': 464.353, 'text': 'For example, it is a good proxy for customer profile data.', 'start': 460.711, 'duration': 3.642}, {'end': 468.835, 'text': 'That is the kinds of things that you see in the Titanic data set.', 'start': 464.513, 'duration': 4.322}, {'end': 477.5, 'text': "the nature of the data, the fact that it's not 100% clean, the fact that it has overloaded data columns in terms of meaning,", 'start': 468.835, 'duration': 8.665}], 'summary': 'The titanic dataset from kaggle is a good proxy for common business data, such as customer profile data, due to its nature and cleanliness.', 'duration': 32.777, 'max_score': 444.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4444723.jpg'}, {'end': 537.299, 'src': 'embed', 'start': 507.972, 'weight': 2, 'content': [{'end': 513.352, 'text': 'So to illustrate that, we can go over the data dictionary for the Titanic data set.', 'start': 507.972, 'duration': 5.38}, {'end': 517.014, 'text': 'And what you can see here is the data dictionary.', 'start': 514.053, 'duration': 2.961}, {'end': 524.696, 'text': 'And down this list of columns, this column here, we have all the variables, all the columns in the data frame that are available in the Titanic data.', 'start': 517.794, 'duration': 6.902}, {'end': 528.057, 'text': 'So first up, we have a variable called survived.', 'start': 525.136, 'duration': 2.921}, {'end': 530.937, 'text': 'And it defines survival, essentially.', 'start': 529.097, 'duration': 1.84}, {'end': 532.278, 'text': "It's a binary indicator.", 'start': 531.258, 'duration': 1.02}, {'end': 537.299, 'text': "It's zero when a passenger in the Titanic did not make it, that they perished.", 'start': 532.698, 'duration': 4.601}], 'summary': "The titanic data set contains a binary indicator for survival, with 'survived' variable defining it.", 'duration': 29.327, 'max_score': 507.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4507972.jpg'}, {'end': 597.133, 'src': 'embed', 'start': 554.418, 'weight': 3, 'content': [{'end': 560.003, 'text': 'All of these are designations of the level of accommodations that you have on the train.', 'start': 554.418, 'duration': 5.585}, {'end': 562.805, 'text': 'In a similar fashion on the Titanic.', 'start': 560.543, 'duration': 2.262}, {'end': 569.411, 'text': 'you also had first class tickets, second class tickets, third class tickets that defined your accommodation on the boat.', 'start': 562.805, 'duration': 6.606}, {'end': 573.634, 'text': 'Next up, we have an indicator of gender, the sex variable.', 'start': 570.592, 'duration': 3.042}, {'end': 575.116, 'text': "It's going to be male or female.", 'start': 573.955, 'duration': 1.161}, {'end': 578.359, 'text': 'We also have a variable in the data set for age.', 'start': 575.977, 'duration': 2.382}, {'end': 579.9, 'text': 'This is a continuous variable.', 'start': 578.559, 'duration': 1.341}, {'end': 581.442, 'text': "It's a numeric column.", 'start': 579.98, 'duration': 1.462}, {'end': 583.203, 'text': 'It has decimal points in it.', 'start': 582.122, 'duration': 1.081}, {'end': 586.286, 'text': "So for example, you'll see ages of 33.44 years, that sort of thing.", 'start': 583.884, 'duration': 2.402}, {'end': 592.251, 'text': 'And then next up we have two variables called sibspa and parch.', 'start': 589.389, 'duration': 2.862}, {'end': 597.133, 'text': 'And the reason why these are interesting is because what I mentioned on the previous slide,', 'start': 592.651, 'duration': 4.482}], 'summary': 'Transcript covers train accommodations, titanic comparison, gender, age, and related variables.', 'duration': 42.715, 'max_score': 554.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4554418.jpg'}, {'end': 650.346, 'src': 'embed', 'start': 619.209, 'weight': 6, 'content': [{'end': 622.412, 'text': "Hey, there's this data value over here.", 'start': 619.209, 'duration': 3.203}, {'end': 623.492, 'text': 'I can overload it.', 'start': 622.432, 'duration': 1.06}, {'end': 626.155, 'text': 'So sometimes it means this, and sometimes it means that.', 'start': 623.613, 'duration': 2.542}, {'end': 629.918, 'text': 'Happens all the time in modern business systems.', 'start': 626.715, 'duration': 3.203}, {'end': 635.061, 'text': 'And this is what we actually have here in this SIBSP and parche variables.', 'start': 630.378, 'duration': 4.683}, {'end': 636.522, 'text': 'The first one is overloaded.', 'start': 635.341, 'duration': 1.181}, {'end': 642.884, 'text': "It is the count of the number of siblings and or whether or not you're traveling with a spouse.", 'start': 636.702, 'duration': 6.182}, {'end': 644.324, 'text': "Notice how it's overloaded.", 'start': 643.524, 'duration': 0.8}, {'end': 650.346, 'text': "It could either be just the number of brothers and sisters that you're traveling with, or it could be your spouse,", 'start': 644.364, 'duration': 5.982}], 'summary': 'Data values in sibsp and parche are overloaded, representing sibling count and spouse presence.', 'duration': 31.137, 'max_score': 619.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4619209.jpg'}], 'start': 383.283, 'title': 'Using titanic dataset for ml', 'summary': "Discusses the reasons for using the kaggle competition's titanic machine learning from disaster dataset, including its widespread familiarity, and its representation of common business data, making it useful for diverse audiences and predictive modeling. it also outlines the data dictionary for the titanic data set, including variables such as survival rates, ticket class, gender, age, and overloaded data fields like siblings/spouse and parents/children counts.", 'chapters': [{'end': 505.818, 'start': 383.283, 'title': 'Using titanic dataset for ml', 'summary': "Discusses the reasons for using the kaggle competition's titanic machine learning from disaster dataset, including its widespread familiarity, and its representation of common business data, making it useful for diverse audiences and predictive modeling.", 'duration': 122.535, 'highlights': ['The Titanic dataset is extensively used due to its widespread familiarity, making it suitable for diverse audiences and tutorials.', 'The dataset serves as a good proxy for common business data, such as customer profile data, making it useful for predictive modeling.']}, {'end': 700.686, 'start': 507.972, 'title': 'Titanic data dictionary', 'summary': 'Outlines the data dictionary for the titanic data set, including variables such as survival rates, ticket class, gender, age, and overloaded data fields like siblings/spouse and parents/children counts.', 'duration': 192.714, 'highlights': ["The variable 'survived' indicates survival, with a value of 0 for non-survivors and 1 for survivors.", "The 'ticket class' variable represents the level of accommodations on the Titanic, categorized into first, second, and third classes.", "The 'sex' variable denotes the gender of the passenger, either male or female.", "The 'age' variable represents the continuous numeric age of the passengers, including decimal values.", "The 'sibspa' and 'parch' variables are indicative of overloaded data fields, representing counts of siblings/spouse and parents/children respectively."]}], 'duration': 317.403, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4383283.jpg', 'highlights': ['The Titanic dataset is extensively used due to its widespread familiarity, making it suitable for diverse audiences and tutorials.', 'The dataset serves as a good proxy for common business data, such as customer profile data, making it useful for predictive modeling.', "The variable 'survived' indicates survival, with a value of 0 for non-survivors and 1 for survivors.", "The 'ticket class' variable represents the level of accommodations on the Titanic, categorized into first, second, and third classes.", "The 'sex' variable denotes the gender of the passenger, either male or female.", "The 'age' variable represents the continuous numeric age of the passengers, including decimal values.", "The 'sibspa' and 'parch' variables are indicative of overloaded data fields, representing counts of siblings/spouse and parents/children respectively."]}, {'end': 1314.657, 'segs': [{'end': 738.041, 'src': 'embed', 'start': 701.306, 'weight': 0, 'content': [{'end': 706.411, 'text': "And the reason why we're going through this data is to level set everybody on the webinar today about the data set.", 'start': 701.306, 'duration': 5.105}, {'end': 708.393, 'text': 'Because this is going to be super important.', 'start': 707.052, 'duration': 1.341}, {'end': 715.641, 'text': 'Because understanding these columns and what they mean are critical for actually saying, I have this type of business question.', 'start': 708.554, 'duration': 7.087}, {'end': 718.704, 'text': 'I would like to create a visualization that potentially answers that question.', 'start': 715.661, 'duration': 3.043}, {'end': 721.527, 'text': 'So you need to know what pieces of data to use in that visualization.', 'start': 718.784, 'duration': 2.743}, {'end': 727.171, 'text': "Okay, so now we're familiar with the data set, we can talk about our scenario.", 'start': 722.867, 'duration': 4.304}, {'end': 730.053, 'text': 'So we are a consulting data scientist.', 'start': 728.352, 'duration': 1.701}, {'end': 731.515, 'text': 'This is our hypothetical scenario.', 'start': 730.114, 'duration': 1.401}, {'end': 738.041, 'text': "So we're a consulting data scientist, and we've been hired by a company to analyze the Titanic data set.", 'start': 731.695, 'duration': 6.346}], 'summary': 'Level setting on data set for consulting data scientist hired to analyze titanic data set.', 'duration': 36.735, 'max_score': 701.306, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4701306.jpg'}, {'end': 905.467, 'src': 'embed', 'start': 869.98, 'weight': 2, 'content': [{'end': 870.561, 'text': "This is what you'll see.", 'start': 869.98, 'duration': 0.581}, {'end': 873.983, 'text': 'Create elegant data visualizations using the grammar of graphics.', 'start': 870.601, 'duration': 3.382}, {'end': 879.748, 'text': 'Not surprisingly, the grammar of graphics, or GG, is where the GG in ggplot2 comes from.', 'start': 874.664, 'duration': 5.084}, {'end': 886.774, 'text': 'Now, ggplot2 has become the de facto standard visualization tool in R.', 'start': 880.709, 'duration': 6.065}, {'end': 889.777, 'text': 'And to illustrate the point, you are probably like me.', 'start': 886.774, 'duration': 3.003}, {'end': 895.442, 'text': "You probably have various social media feeds, whether that's Twitter or LinkedIn or Facebook.", 'start': 890.257, 'duration': 5.185}, {'end': 905.467, 'text': "And if you're interested in data science topics, you'll get a daily feed of LinkedIn posts and articles or articles on Facebook or tweets.", 'start': 896.162, 'duration': 9.305}], 'summary': 'Ggplot2 is the de facto standard visualization tool in r for creating elegant data visualizations using the grammar of graphics.', 'duration': 35.487, 'max_score': 869.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4869980.jpg'}, {'end': 1088.973, 'src': 'embed', 'start': 1039.564, 'weight': 3, 'content': [{'end': 1047.568, 'text': 'ggplot2 is both complicated with fine-grained control, allows you to do many, many things that you would like, exactly the way you would like,', 'start': 1039.564, 'duration': 8.004}, {'end': 1048.909, 'text': "and it's also easy to use.", 'start': 1047.568, 'duration': 1.341}, {'end': 1053.932, 'text': 'It literally will allow you to create print quality graphics in seconds.', 'start': 1049.986, 'duration': 3.946}, {'end': 1059.24, 'text': "Now, I know those two things seem to be at odds, but bear with me over the next couple of slides and I'll explain further.", 'start': 1054.252, 'duration': 4.988}, {'end': 1062.137, 'text': 'Okay, the grammar.', 'start': 1060.956, 'duration': 1.181}, {'end': 1067.843, 'text': 'So every visualization in ggplot2 is composed of the following things.', 'start': 1062.898, 'duration': 4.945}, {'end': 1072.568, 'text': 'This is the six aspects of the ggplot2 grammar of graphics.', 'start': 1067.903, 'duration': 4.665}, {'end': 1078.013, 'text': 'Okay, first up, you have the data, which is the raw material for your visualization.', 'start': 1073.449, 'duration': 4.564}, {'end': 1079.415, 'text': 'Makes total sense.', 'start': 1078.714, 'duration': 0.701}, {'end': 1081.156, 'text': 'You absolutely have to have data.', 'start': 1079.975, 'duration': 1.181}, {'end': 1082.758, 'text': 'No data, no visualization.', 'start': 1081.597, 'duration': 1.161}, {'end': 1088.973, 'text': 'Next up, another aspect of the grammar is what is known as layers.', 'start': 1084.73, 'duration': 4.243}], 'summary': 'Ggplot2 offers fine-grained control and creates print quality graphics in seconds, based on the six aspects of grammar of graphics.', 'duration': 49.409, 'max_score': 1039.564, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41039564.jpg'}], 'start': 701.306, 'title': 'Titanic data analysis and ggplot2 introduction', 'summary': 'Discusses the importance of understanding the titanic data set for creating visualizations and analyzes survival patterns. it also introduces ggplot2 for intuitive visualizations, emphasizing its widespread usage and fine-grained control.', 'chapters': [{'end': 764.08, 'start': 701.306, 'title': 'Titanic data analysis scenario', 'summary': 'Discusses the importance of understanding the data set for creating visualizations and highlights the goal of analyzing the titanic data set to explain patterns of survival, as requested by the consulting business.', 'duration': 62.774, 'highlights': ['The consulting data scientist has been hired to analyze the Titanic data set to explain patterns of survival, as the business wants to understand who survived and who died and the reasons behind it.', 'Understanding the columns and their meanings in the data set is crucial for creating visualizations that answer specific business questions.']}, {'end': 1314.657, 'start': 764.74, 'title': 'Introduction to ggplot2 for data visualization', 'summary': "Discusses the importance of using ggplot2 for creating intuitive visualizations and its widespread usage in the data science community. it highlights the key aspects of ggplot2's grammar of graphics and emphasizes that while it offers fine-grained control, it can also quickly generate print-quality graphics.", 'duration': 549.917, 'highlights': ['The importance of using ggplot2 for creating intuitive visualizations and its widespread usage in the data science community, demonstrated through real-world analogs such as customer churn, fraud detection, and understanding conversion rates.', "The key aspects of ggplot2's grammar of graphics, including data, layers, scales, coordinates, faceting, and themes, providing a comprehensive understanding of the tool's capabilities.", "The balance between ggplot2's fine-grained control for creating customized graphics and its ability to work effectively with default settings, catering to users' varying needs and skill levels."]}], 'duration': 613.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD4701306.jpg', 'highlights': ['The consulting data scientist has been hired to analyze the Titanic data set to explain patterns of survival, as the business wants to understand who survived and who died and the reasons behind it.', 'Understanding the columns and their meanings in the data set is crucial for creating visualizations that answer specific business questions.', 'The importance of using ggplot2 for creating intuitive visualizations and its widespread usage in the data science community, demonstrated through real-world analogs such as customer churn, fraud detection, and understanding conversion rates.', "The key aspects of ggplot2's grammar of graphics, including data, layers, scales, coordinates, faceting, and themes, providing a comprehensive understanding of the tool's capabilities.", "The balance between ggplot2's fine-grained control for creating customized graphics and its ability to work effectively with default settings, catering to users' varying needs and skill levels."]}, {'end': 1644.765, 'segs': [{'end': 1362.655, 'src': 'embed', 'start': 1314.657, 'weight': 0, 'content': [{'end': 1317.778, 'text': 'excuse me, what you need to work with the grammar of graphics.', 'start': 1314.657, 'duration': 3.121}, {'end': 1321.179, 'text': '80% of the time is, you know, a small subset.', 'start': 1317.778, 'duration': 3.401}, {'end': 1326.882, 'text': "So let's talk about what you absolutely have to have to get a ggplot2 graphic going in your R environment.", 'start': 1322.12, 'duration': 4.762}, {'end': 1330.223, 'text': 'So not surprising, the first thing that you need is the data.', 'start': 1327.902, 'duration': 2.321}, {'end': 1332.624, 'text': 'Okay And hopefully enough said.', 'start': 1330.243, 'duration': 2.381}, {'end': 1336.046, 'text': 'Next up, you need an aesthetic.', 'start': 1334.645, 'duration': 1.401}, {'end': 1341.265, 'text': 'And an aesthetic essentially is a mapping of your data to the visualization.', 'start': 1336.843, 'duration': 4.422}, {'end': 1346.268, 'text': "For example, let's say you're creating a plot, a chart,", 'start': 1342.746, 'duration': 3.522}, {'end': 1352.851, 'text': 'and you would like the y-axis to essentially be mapped to the ages of the Titanic passengers in your dataset.', 'start': 1346.268, 'duration': 6.583}, {'end': 1356.512, 'text': 'So the y-axis is age and the x-axis is something else.', 'start': 1353.231, 'duration': 3.281}, {'end': 1362.655, 'text': 'You can actually do that directly in R code using the ggplot2 API.', 'start': 1357.273, 'duration': 5.382}], 'summary': 'To create a ggplot2 graphic in r, you need data and an aesthetic for visualization.', 'duration': 47.998, 'max_score': 1314.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41314657.jpg'}, {'end': 1470.649, 'src': 'embed', 'start': 1439.257, 'weight': 2, 'content': [{'end': 1444.439, 'text': "If you don't tell ggplot that you want dots or lines or bars, it doesn't know what to actually display.", 'start': 1439.257, 'duration': 5.182}, {'end': 1445.84, 'text': 'So you have to provide a layer.', 'start': 1444.86, 'duration': 0.98}, {'end': 1458.865, 'text': 'So these layers, these graphics that you have ggplot render for you, take the function in our code as geom functions, as geom functions.', 'start': 1446.82, 'duration': 12.045}, {'end': 1470.649, 'text': 'So, literally, in the package there are dozens of functions that all start with the prefix geom, which is short for geometry geom underscore point.', 'start': 1459.165, 'duration': 11.484}], 'summary': 'To display visuals in ggplot, you need to provide a layer with geom functions.', 'duration': 31.392, 'max_score': 1439.257, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41439257.jpg'}, {'end': 1535.747, 'src': 'embed', 'start': 1509.949, 'weight': 3, 'content': [{'end': 1515.335, 'text': "Because, quite frankly, Hadley Wickham wrote ggplot2, he's the creator of the package.", 'start': 1509.949, 'duration': 5.386}, {'end': 1518.838, 'text': 'so who better to teach you ggplot2 than the man himself, Mr. Wickham?', 'start': 1515.335, 'duration': 3.503}, {'end': 1527.047, 'text': 'Another reason why I love this book is because it is a great resource for folks of all levels of skill and experience.', 'start': 1520.04, 'duration': 7.007}, {'end': 1535.747, 'text': 'And so in my particular case, let me use myself as an example, By the time I bought this book, I actually had quite a bit of experience with ggplot2.', 'start': 1527.588, 'duration': 8.159}], 'summary': 'Hadley wickham, creator of ggplot2, offers a valuable resource for all skill levels.', 'duration': 25.798, 'max_score': 1509.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41509949.jpg'}, {'end': 1576.856, 'src': 'embed', 'start': 1549.076, 'weight': 4, 'content': [{'end': 1554.3, 'text': 'So by the time I bought the book, I actually knew quite a bit about ggplot2, but I still learned a lot by reading it.', 'start': 1549.076, 'duration': 5.224}, {'end': 1561.365, 'text': "And what's even better yet is that even if you're an experienced excuse me, even if you have absolutely no experience in ggplot2,", 'start': 1555.06, 'duration': 6.305}, {'end': 1565.007, 'text': "you're an absolute beginner with ggplot2, the book is not too advanced.", 'start': 1561.365, 'duration': 3.642}, {'end': 1573.393, 'text': "It taught me things, even though I've been using ggplot2 for a number of years, and it can teach folks with no experience in ggplot2.", 'start': 1565.968, 'duration': 7.425}, {'end': 1576.856, 'text': "It's great for both types of folks, so I absolutely positively recommend it.", 'start': 1573.473, 'duration': 3.383}], 'summary': 'A book on ggplot2 is suitable for both beginners and experienced users, offering valuable insights.', 'duration': 27.78, 'max_score': 1549.076, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41549076.jpg'}, {'end': 1633.081, 'src': 'embed', 'start': 1605.796, 'weight': 5, 'content': [{'end': 1609.218, 'text': "So it's pointing to the right place on my local hard drive here to actually read in the data.", 'start': 1605.796, 'duration': 3.422}, {'end': 1617.83, 'text': 'So if you do not have ggplot2, you can highlight this code right here and run it, and that would install ggplot2.', 'start': 1610.325, 'duration': 7.505}, {'end': 1625.276, 'text': 'R will reach across the internet to a repository, grab the ggplot2 binary, pull it down, install it in your local environment.', 'start': 1618.01, 'duration': 7.266}, {'end': 1633.081, 'text': "I already have ggplot2 installed, so I'll just go ahead and load it first thing, because that's what we need to complete this particular video.", 'start': 1625.636, 'duration': 7.445}], 'summary': 'R can install ggplot2 from the repository and load it for use.', 'duration': 27.285, 'max_score': 1605.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41605796.jpg'}], 'start': 1314.657, 'title': 'Essentials of ggplot2 visualization', 'summary': 'Outlines the essential components needed to create a ggplot2 visualization in r, including data necessity, aesthetics for data mapping, and layers for defining visualization with key examples of geom functions, emphasizing code-based aesthetic mappings.', 'chapters': [{'end': 1484.534, 'start': 1314.657, 'title': 'Essentials of ggplot2 visualization', 'summary': 'Outlines the essential components needed to create a ggplot2 visualization in r, including the necessity of data, aesthetics for data mapping, and the requirement of layers for defining the visualization, with key examples of geom functions, emphasizing the need for code-based aesthetic mappings.', 'duration': 169.877, 'highlights': ['The necessity of data for creating a ggplot2 visualization is emphasized, constituting a fundamental requirement. 80% of the time', 'The significance of aesthetics in mapping data to the visualization is highlighted, emphasizing the need for code-based aesthetic mappings as a requisite for ggplot2. ', 'The requirement of layers for defining the visualization is explained, stressing the need for providing specific functions like geom functions to render the visualization as intended. dozens of functions']}, {'end': 1644.765, 'start': 1485.494, 'title': "Learning ggplot2 with hadley wickham's book", 'summary': "Discusses the value of 'ggplot2' by hadley wickham as the best resource for learning ggplot2, suitable for all levels of skill and experience, and the ease of installation and usage of rstudio and ggplot2.", 'duration': 159.271, 'highlights': ["Learning from 'ggplot2' by Hadley Wickham, the creator of the package, is recommended for all levels of experience as it provides valuable insights and learning opportunities.", 'The book is beneficial for both experienced individuals and absolute beginners in ggplot2, offering valuable insights and knowledge even for those with prior experience in ggplot2.', 'The ease of installing ggplot2 using RStudio, with the option of installing it directly from the internet, is highlighted, providing a seamless setup process for users.']}], 'duration': 330.108, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41314657.jpg', 'highlights': ['The necessity of data for creating a ggplot2 visualization is emphasized, constituting a fundamental requirement. 80% of the time', 'The significance of aesthetics in mapping data to the visualization is highlighted, emphasizing the need for code-based aesthetic mappings as a requisite for ggplot2.', 'The requirement of layers for defining the visualization is explained, stressing the need for providing specific functions like geom functions to render the visualization as intended. dozens of functions', "Learning from 'ggplot2' by Hadley Wickham, the creator of the package, is recommended for all levels of experience as it provides valuable insights and learning opportunities.", 'The book is beneficial for both experienced individuals and absolute beginners in ggplot2, offering valuable insights and knowledge even for those with prior experience in ggplot2.', 'The ease of installing ggplot2 using RStudio, with the option of installing it directly from the internet, is highlighted, providing a seamless setup process for users.']}, {'end': 2409.184, 'segs': [{'end': 1668.206, 'src': 'embed', 'start': 1644.765, 'weight': 0, 'content': [{'end': 1653.088, 'text': "load up the CSV into a data frame and then it'll actually show us the Titanic data in RStudio's spreadsheet view using this view function.", 'start': 1644.765, 'duration': 8.323}, {'end': 1654.428, 'text': "So let's go ahead and do that.", 'start': 1653.668, 'duration': 0.76}, {'end': 1660.644, 'text': "Okay, so first things that we'll notice is that this data set's not particularly large.", 'start': 1656.263, 'duration': 4.381}, {'end': 1668.206, 'text': "It's only 891 observations of 12 columns of data, but, as I indicated before, despite its small size,", 'start': 1660.844, 'duration': 7.362}], 'summary': 'Loading titanic data in rstudio reveals 891 observations and 12 columns.', 'duration': 23.441, 'max_score': 1644.765, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41644765.jpg'}, {'end': 1865.676, 'src': 'embed', 'start': 1829.05, 'weight': 2, 'content': [{'end': 1830.172, 'text': "We wouldn't do math like that.", 'start': 1829.05, 'duration': 1.122}, {'end': 1834.757, 'text': "Ergo, survived really isn't what is known as ratio data.", 'start': 1830.412, 'duration': 4.345}, {'end': 1835.818, 'text': "It's not really numeric.", 'start': 1834.917, 'duration': 0.901}, {'end': 1838.281, 'text': "We're not going to use it for any sort of numeric calculations.", 'start': 1835.838, 'duration': 2.443}, {'end': 1842.663, 'text': "So that's an indication to us that it should probably be a factor.", 'start': 1838.921, 'duration': 3.742}, {'end': 1844.444, 'text': 'It should be a categorical variable.', 'start': 1843.023, 'duration': 1.421}, {'end': 1848.066, 'text': "Either you survived or you didn't, essentially.", 'start': 1845.165, 'duration': 2.901}, {'end': 1851.448, 'text': 'In a similar fashion, we can analyze p-class.', 'start': 1849.367, 'duration': 2.081}, {'end': 1856.291, 'text': 'So p-class has three values, one, two, and three.', 'start': 1852.408, 'duration': 3.883}, {'end': 1865.676, 'text': 'And again, we can apply our heuristic of saying, would we do multiplication or division on p-class data? And the answer is no.', 'start': 1856.871, 'duration': 8.805}], 'summary': 'Survival and p-class are categorical variables, not suitable for numeric calculations.', 'duration': 36.626, 'max_score': 1829.05, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41829050.jpg'}, {'end': 2009.561, 'src': 'heatmap', 'start': 1922.581, 'weight': 0.94, 'content': [{'end': 1926.084, 'text': "Notice in the age column, we're missing data.", 'start': 1922.581, 'duration': 3.503}, {'end': 1928.225, 'text': "We're missing data in the age column here.", 'start': 1926.804, 'duration': 1.421}, {'end': 1929.886, 'text': "That's going to be important later.", 'start': 1928.866, 'duration': 1.02}, {'end': 1934.53, 'text': 'ggplot2 will accommodate this and we should take a look at it,', 'start': 1930.507, 'duration': 4.023}, {'end': 1942.815, 'text': "just so that we're aware of what ggplot does in the situation where we try to create graphics visualizations on columns where some data is missing.", 'start': 1934.53, 'duration': 8.285}, {'end': 1944.957, 'text': "OK, so there's our data.", 'start': 1943.656, 'duration': 1.301}, {'end': 1952.888, 'text': "So obviously, next up, we're going to go ahead and set all of these variables that should be factors as factors.", 'start': 1946.761, 'duration': 6.127}, {'end': 1956.492, 'text': "So I'm just going to run this code, and good to go.", 'start': 1952.928, 'duration': 3.564}, {'end': 1961.658, 'text': 'Now, the reason why this is important is because ggplot2 is smart.', 'start': 1957.313, 'duration': 4.345}, {'end': 1963.28, 'text': 'ggplot2 is smart.', 'start': 1962.379, 'duration': 0.901}, {'end': 1972.248, 'text': 'In certain aspects of how you code up visualizations in ggplot2, if you provide it, factor variables as part of the visualization,', 'start': 1964.281, 'duration': 7.967}, {'end': 1976.571, 'text': 'as part of the aesthetic, as part of the mapping, it will do cool and interesting things.', 'start': 1972.248, 'duration': 4.323}, {'end': 1978.233, 'text': "As we'll see in a bit,", 'start': 1977.052, 'duration': 1.181}, {'end': 1985.699, 'text': 'we can actually use factor variables to actually color code our visualizations and really make them pop in terms of the information that they can portray.', 'start': 1978.233, 'duration': 7.466}, {'end': 1987.974, 'text': 'Okay, cool.', 'start': 1986.914, 'duration': 1.06}, {'end': 1989.895, 'text': "So we've got our data prepped and we're good to go.", 'start': 1987.994, 'duration': 1.901}, {'end': 1995.097, 'text': "So going back to our scenario, we're a consulting data scientist.", 'start': 1990.755, 'duration': 4.342}, {'end': 2001.338, 'text': "We've been hired by a company to analyze the Titanic data and understand the underlying signal in the data.", 'start': 1995.257, 'duration': 6.081}, {'end': 2009.561, 'text': "Explain to me, based on the data, the story of who survives and who doesn't in the Titanic.", 'start': 2002.119, 'duration': 7.442}], 'summary': 'Preparing data for visualization with ggplot2, focusing on factor variables and missing data.', 'duration': 86.98, 'max_score': 1922.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41922581.jpg'}, {'end': 1972.248, 'src': 'embed', 'start': 1946.761, 'weight': 1, 'content': [{'end': 1952.888, 'text': "So obviously, next up, we're going to go ahead and set all of these variables that should be factors as factors.", 'start': 1946.761, 'duration': 6.127}, {'end': 1956.492, 'text': "So I'm just going to run this code, and good to go.", 'start': 1952.928, 'duration': 3.564}, {'end': 1961.658, 'text': 'Now, the reason why this is important is because ggplot2 is smart.', 'start': 1957.313, 'duration': 4.345}, {'end': 1963.28, 'text': 'ggplot2 is smart.', 'start': 1962.379, 'duration': 0.901}, {'end': 1972.248, 'text': 'In certain aspects of how you code up visualizations in ggplot2, if you provide it, factor variables as part of the visualization,', 'start': 1964.281, 'duration': 7.967}], 'summary': 'Setting factor variables is important for smart visualizations in ggplot2.', 'duration': 25.487, 'max_score': 1946.761, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41946761.jpg'}, {'end': 2077.038, 'src': 'embed', 'start': 2052.492, 'weight': 4, 'content': [{'end': 2060.21, 'text': "And in particular, we're saying, hey, ggplot2, map the x-axis to the survived column of the Titanic data set.", 'start': 2052.492, 'duration': 7.718}, {'end': 2062.992, 'text': 'And then lastly, we need a layer.', 'start': 2061.411, 'duration': 1.581}, {'end': 2071.155, 'text': "And here we're saying, please use geom underscore bar, which actually translates in ggplot2 to a bar chart.", 'start': 2063.331, 'duration': 7.824}, {'end': 2077.038, 'text': 'A bar chart is an excellent thing to use for categorical data because essentially a bar chart is just a count.', 'start': 2071.716, 'duration': 5.322}], 'summary': "Using ggplot2 to create a bar chart for the 'survived' column in the titanic data set, a great tool for categorical data analysis.", 'duration': 24.546, 'max_score': 2052.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42052492.jpg'}, {'end': 2222.67, 'src': 'embed', 'start': 2196.932, 'weight': 3, 'content': [{'end': 2202.857, 'text': 'very easily we have a print quality graphic, a print quality visualization.', 'start': 2196.932, 'duration': 5.925}, {'end': 2211.724, 'text': "And you'll notice that along the x-axis we have survived, zero for those folks that perished on the Titanic, and a one for those people that survived.", 'start': 2203.818, 'duration': 7.906}, {'end': 2218.19, 'text': 'And notice right away we can tell that more people perished than survived, unfortunately, on the Titanic.', 'start': 2212.285, 'duration': 5.905}, {'end': 2222.67, 'text': 'And we can also kind of estimate relative proportions here as well.', 'start': 2218.789, 'duration': 3.881}], 'summary': 'Print quality graphic shows more perished than survived on the titanic.', 'duration': 25.738, 'max_score': 2196.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42196932.jpg'}, {'end': 2312.842, 'src': 'embed', 'start': 2285.922, 'weight': 5, 'content': [{'end': 2292.046, 'text': 'And if you go back here and look at the graphic, yeah, you can roughly kind of eyeball and say, yeah, that seems about right.', 'start': 2285.922, 'duration': 6.124}, {'end': 2293.647, 'text': 'Seems about right.', 'start': 2293.127, 'duration': 0.52}, {'end': 2298.231, 'text': 'Okay So moving on.', 'start': 2294.848, 'duration': 3.383}, {'end': 2307.658, 'text': "So this graphic is print quality, but it's probably not customized enough to actually print.", 'start': 2299.112, 'duration': 8.546}, {'end': 2312.842, 'text': "So let's add a little bit of ggplot2 code to actually improve this a little bit.", 'start': 2308.379, 'duration': 4.463}], 'summary': 'Adding ggplot2 code to improve print quality graphic.', 'duration': 26.92, 'max_score': 2285.922, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42285922.jpg'}], 'start': 1644.765, 'title': 'Analyzing titanic data in rstudio and ggplot2', 'summary': 'Covers loading and analyzing the titanic dataset in rstudio, emphasizing key columns and categorizing variables. it also discusses using ggplot2 to analyze survival rates, revealing that about 62% of passengers perished and 38% survived.', 'chapters': [{'end': 1989.895, 'start': 1644.765, 'title': 'Analyzing titanic data in rstudio', 'summary': "Covers the process of loading and analyzing the titanic dataset in rstudio, highlighting the dataset's size, key columns, and the importance of categorizing variables, with specific focus on 'survived', 'p-class', 'sex', and 'embarked'.", 'duration': 345.13, 'highlights': ['The Titanic dataset contains 891 observations of 12 columns of data, extensively used for teaching purposes at Data Science Dojo. The dataset consists of 891 observations and 12 columns, making it a valuable resource for educational purposes at Data Science Dojo.', "Highlighting the insignificance of the 'name' and 'passenger ID' columns due to their lack of analytical signal, with 'passenger ID' serving as a unique identifier and 'name' being composed of 891 unique names. The 'passenger ID' and 'name' columns are deemed insignificant for analysis due to their lack of analytical signal, with 'passenger ID' acting as a unique identifier and 'name' consisting of 891 unique names.", "Emphasizing the importance of categorizing variables like 'survived', 'p-class', 'sex', and 'embarked' as factors, based on their nature and usage in analytical context. The significance of categorizing variables such as 'survived', 'p-class', 'sex', and 'embarked' as factors is underscored, considering their nature and usage in analytical context.", 'Preparing the data by setting variables that should be factors as factors, enabling the use of factor variables in visualizations using ggplot2. The process of preparing the data involves setting the necessary variables as factors to leverage them in visualizations using ggplot2.']}, {'end': 2409.184, 'start': 1990.755, 'title': 'Analyzing titanic data with ggplot2', 'summary': 'Discusses using ggplot2 to analyze the survival rates of the titanic data set, indicating that about 62% of the passengers unfortunately perished while approximately 38% survived, and demonstrates customizing print quality graphics with ggplot2.', 'duration': 418.429, 'highlights': ['About 62% of the passengers unfortunately perished on the Titanic, and about 38% survived. Demonstrates the key insight into the survival rates of the Titanic data set.', 'The chapter discusses using ggplot2 to analyze the survival rates of the Titanic data set. Provides an overview of the main focus of the chapter.', 'Demonstrates customizing print quality graphics with ggplot2. Highlights the practical application of ggplot2 for improving the visualization of data.']}], 'duration': 764.419, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD41644765.jpg', 'highlights': ['The Titanic dataset contains 891 observations of 12 columns of data, making it a valuable resource for educational purposes at Data Science Dojo.', 'The process of preparing the data involves setting the necessary variables as factors to leverage them in visualizations using ggplot2.', "The significance of categorizing variables such as 'survived', 'p-class', 'sex', and 'embarked' as factors is underscored, considering their nature and usage in analytical context.", 'About 62% of the passengers unfortunately perished on the Titanic, and about 38% survived. Demonstrates the key insight into the survival rates of the Titanic data set.', 'The chapter discusses using ggplot2 to analyze the survival rates of the Titanic data set. Provides an overview of the main focus of the chapter.', 'Demonstrates customizing print quality graphics with ggplot2. Highlights the practical application of ggplot2 for improving the visualization of data.']}, {'end': 2759.158, 'segs': [{'end': 2434.468, 'src': 'embed', 'start': 2409.804, 'weight': 2, 'content': [{'end': 2416.116, 'text': 'So if I highlight all this code and I run it, You can now see our modified visualization.', 'start': 2409.804, 'duration': 6.312}, {'end': 2423.461, 'text': 'And notice here that we get a nice passenger count label here on the y-axis.', 'start': 2417.057, 'duration': 6.404}, {'end': 2427.583, 'text': 'We get a new title called Titanic Survival Rates and a nice white background.', 'start': 2423.621, 'duration': 3.962}, {'end': 2430.325, 'text': 'So this is actually a print quality graphic.', 'start': 2428.024, 'duration': 2.301}, {'end': 2434.468, 'text': 'You could actually submit this to, for example, an academic journal for publication.', 'start': 2430.385, 'duration': 4.083}], 'summary': 'Modified visualization with passenger count label and new title, suitable for academic publication.', 'duration': 24.664, 'max_score': 2409.804, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42409804.jpg'}, {'end': 2502.313, 'src': 'embed', 'start': 2457.154, 'weight': 3, 'content': [{'end': 2463.776, 'text': "If you're familiar with the nautical space, there is a popular adage that essentially says women and children first.", 'start': 2457.154, 'duration': 6.622}, {'end': 2472.601, 'text': "So being relatively new to the space, let's say, as this consulting data scientist, we assume, hey, that's a reasonable hypothesis to explore.", 'start': 2464.756, 'duration': 7.845}, {'end': 2481.225, 'text': "Can I actually use ggplot2 to answer the question, what was the survival rate by gender? And that's actually relatively easy.", 'start': 2473.101, 'duration': 8.124}, {'end': 2490.631, 'text': 'Notice that once again, I can take this copy and paste, reuse from the previous visualization and just modify it a little bit for it to work.', 'start': 2481.405, 'duration': 9.226}, {'end': 2491.851, 'text': 'now to answer this question.', 'start': 2490.631, 'duration': 1.22}, {'end': 2496.452, 'text': 'And the two things that I needed to change to answer this question.', 'start': 2492.011, 'duration': 4.441}, {'end': 2502.313, 'text': 'well, technically, the three things that I needed to change are first of all, I map the x-axis now to sex.', 'start': 2496.452, 'duration': 5.861}], 'summary': 'Consulting data scientist explores survival rate by gender using ggplot2.', 'duration': 45.159, 'max_score': 2457.154, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42457154.jpg'}, {'end': 2610.098, 'src': 'heatmap', 'start': 2514.995, 'weight': 0.817, 'content': [{'end': 2525.606, 'text': 'If you provide fill as part of your aesthetic mapping to ggplot2, and you assign to it a categorical variable, what you get is a nice color coding.', 'start': 2514.995, 'duration': 10.611}, {'end': 2535.234, 'text': 'Many, many of the geoms of the layers in ggplot2 have the ability to be filled, to be color-coded with a fill rate.', 'start': 2526.327, 'duration': 8.907}, {'end': 2545.858, 'text': 'And what this does essentially is says look, each of my bars will now be filled based on the value of survived, whether it was 0 or 1,', 'start': 2535.594, 'duration': 10.264}, {'end': 2549.32, 'text': 'and fill the bar up proportional to the counts that you get.', 'start': 2545.858, 'duration': 3.462}, {'end': 2557.783, 'text': 'And this allows you to essentially get a bar chart that will show you the relative proportions of those who lived and who died by the sex variable by gender.', 'start': 2549.4, 'duration': 8.383}, {'end': 2563.841, 'text': 'And the third thing that I need to change, obviously, is I just need to update the title here to make it correct.', 'start': 2559.025, 'duration': 4.816}, {'end': 2568.472, 'text': 'So if I highlight all this code and run it, Look at that.', 'start': 2564.363, 'duration': 4.109}, {'end': 2570.394, 'text': 'Nice This really pops.', 'start': 2568.953, 'duration': 1.441}, {'end': 2571.775, 'text': 'This really pops.', 'start': 2571.134, 'duration': 0.641}, {'end': 2574.136, 'text': 'I can just make this a little bit bigger.', 'start': 2572.535, 'duration': 1.601}, {'end': 2579.6, 'text': 'And you can see here, as we would expect, sex is along the horizontal here, along the x-axis.', 'start': 2574.877, 'duration': 4.723}, {'end': 2580.821, 'text': 'I have females and males.', 'start': 2579.64, 'duration': 1.181}, {'end': 2583.602, 'text': 'And notice how much information I get out of this.', 'start': 2581.581, 'duration': 2.021}, {'end': 2587.065, 'text': 'First up, obviously, I have color coding.', 'start': 2584.783, 'duration': 2.282}, {'end': 2596.07, 'text': 'Survived This orangish color here indicates somebody, a passenger of that Titanic did not make it.', 'start': 2587.485, 'duration': 8.585}, {'end': 2596.731, 'text': 'They perished.', 'start': 2596.11, 'duration': 0.621}, {'end': 2600.673, 'text': 'And this green or teal color actually denotes that they, in fact, survived.', 'start': 2597.451, 'duration': 3.222}, {'end': 2610.098, 'text': 'And the first thing that you notice obviously is this big orange colored block right here, which says look, most males did not survive on the Titanic.', 'start': 2600.753, 'duration': 9.345}], 'summary': 'Using ggplot2, color-coded bar chart shows survival rates by gender on titanic.', 'duration': 95.103, 'max_score': 2514.995, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42514995.jpg'}, {'end': 2649.113, 'src': 'embed', 'start': 2613.2, 'weight': 0, 'content': [{'end': 2620.829, 'text': 'The other thing that probably grabbed your attention was this big teal colored box, which shows you that most females on the Titanic survived.', 'start': 2613.2, 'duration': 7.629}, {'end': 2623.592, 'text': 'So what this tells you is a couple different things.', 'start': 2621.469, 'duration': 2.123}, {'end': 2629.558, 'text': 'First up, females survived disproportionately more than males did on the Titanic.', 'start': 2623.952, 'duration': 5.606}, {'end': 2636.524, 'text': 'and also it tells you that there were a lot more males on the titanic to begin with than there were females, approximately a two to one ratio,', 'start': 2630.299, 'duration': 6.225}, {'end': 2639.426, 'text': 'twice as many males approximately than there were females.', 'start': 2636.524, 'duration': 2.902}, {'end': 2641.327, 'text': 'so this is a great visualization.', 'start': 2639.426, 'duration': 1.901}, {'end': 2649.113, 'text': 'notice how, from a storytelling perspective, this is extremely intuitive and very easy to to talk to non-data savvy professionals about,', 'start': 2641.327, 'duration': 7.786}], 'summary': 'Most females on the titanic survived, disproportional to males.', 'duration': 35.913, 'max_score': 2613.2, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42613200.jpg'}], 'start': 2409.804, 'title': 'Creating quality visualizations with ggplot2', 'summary': 'Demonstrates creating print-quality visualizations using ggplot2, exploring a working hypothesis for survival rate by gender and revealing a two to one ratio of males to females in titanic survival, highlighting the importance of gender in survival.', 'chapters': [{'end': 2549.32, 'start': 2409.804, 'title': 'Creating quality visualizations with ggplot2', 'summary': 'Demonstrates how to create print-quality visualizations using ggplot2, with modifications for academic journal publication and exploring a working hypothesis for survival rate by gender, providing a quick and easy process for data analysis.', 'duration': 139.516, 'highlights': ['The chapter demonstrates the process of creating a print-quality graphic using ggplot2, suitable for academic journal publication. Demonstrates modifying visualizations for academic publication, indicating the ease and speed of the process.', 'Exploring a working hypothesis for survival rate by gender using ggplot2, which is relatively easy and quick to do. Shows the ease of exploring hypotheses using ggplot2, indicating the quick and straightforward nature of the process.', 'Modifying and reusing previous visualizations to answer new questions, such as survival rate by gender, by mapping the x-axis to sex and adding a fill for color coding. Emphasizes the efficiency of modifying and reusing code, detailing the specific changes required to answer new questions.']}, {'end': 2759.158, 'start': 2549.4, 'title': 'Titanic survival visualization', 'summary': 'Explores a visualization showing the disproportionate survival rates by gender on the titanic, revealing that females survived more than males, with approximately a two to one ratio of males to females, and provides insights into the importance of gender in survival.', 'duration': 209.758, 'highlights': ['Females survived disproportionately more than males did on the Titanic. The visualization reveals that most females on the Titanic survived, indicating a disproportionate survival rate compared to males.', 'Approximately a two to one ratio of males to females on the Titanic. The visualization shows that there were approximately twice as many males as females on the Titanic, providing quantifiable data on the gender ratio.', 'The visualization provides intuitive storytelling and easy understanding for non-data savvy professionals. It is highlighted that the visualization is extremely intuitive and easy to comprehend, making it a great way to convey the story to non-data savvy professionals.', 'The hypothesis that the class of the ticket may have played a role in survival is discussed. The discussion revolves around the hypothesis that the class of the ticket, particularly first and second class, may have influenced survival due to the proximity to lifeboats, presenting an additional factor to consider in survival analysis.']}], 'duration': 349.354, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42409804.jpg', 'highlights': ['Females survived disproportionately more than males did on the Titanic.', 'Approximately a two to one ratio of males to females on the Titanic.', 'The chapter demonstrates the process of creating a print-quality graphic using ggplot2, suitable for academic journal publication.', 'Exploring a working hypothesis for survival rate by gender using ggplot2, which is relatively easy and quick to do.', 'Modifying and reusing previous visualizations to answer new questions, such as survival rate by gender, by mapping the x-axis to sex and adding a fill for color coding.']}, {'end': 3276.521, 'segs': [{'end': 2828.659, 'src': 'embed', 'start': 2786.072, 'weight': 0, 'content': [{'end': 2787.993, 'text': 'And again, I want to stress this enough.', 'start': 2786.072, 'duration': 1.921}, {'end': 2789.574, 'text': "I can't stress this enough.", 'start': 2788.753, 'duration': 0.821}, {'end': 2797.217, 'text': "You're actually pretty productive with ggplot2 because of this copy and paste reuse technique.", 'start': 2790.954, 'duration': 6.263}, {'end': 2801.899, 'text': 'I mean it literally only takes you seconds to run through a whole bunch of visualizations,', 'start': 2797.277, 'duration': 4.622}, {'end': 2804.44, 'text': 'because essentially you type out the first one in your R code.', 'start': 2801.899, 'duration': 2.541}, {'end': 2807.602, 'text': "And after that, it's just a bunch of copy and pasting and tweaking.", 'start': 2804.941, 'duration': 2.661}, {'end': 2814.933, 'text': 'As many folks know, I do have experience with tools like Tableau.', 'start': 2811.391, 'duration': 3.542}, {'end': 2818.735, 'text': 'I do have experience with tools like Power BI and creating graphics in Excel.', 'start': 2814.993, 'duration': 3.742}, {'end': 2822.797, 'text': 'But I use R almost exclusively for all my data visualization.', 'start': 2819.135, 'duration': 3.662}, {'end': 2828.659, 'text': 'And the reason for that is I have fine-grained control over my visualizations because of the grammar of graphics.', 'start': 2822.817, 'duration': 5.842}], 'summary': "R's ggplot2 allows quick, fine-grained visualization, increasing productivity.", 'duration': 42.587, 'max_score': 2786.072, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42786072.jpg'}, {'end': 3235.187, 'src': 'embed', 'start': 3190.799, 'weight': 3, 'content': [{'end': 3201.925, 'text': 'males in first class had the highest levels of survivability and in second class probably a bit higher than third class, but not by a lot,', 'start': 3190.799, 'duration': 11.126}, {'end': 3205.507, 'text': 'and most likely in third class the least chance of survival.', 'start': 3201.925, 'duration': 3.582}, {'end': 3207.779, 'text': 'So this is super interesting.', 'start': 3206.718, 'duration': 1.061}, {'end': 3210.601, 'text': 'And also notice, though, too, something else that pops out at this.', 'start': 3207.939, 'duration': 2.662}, {'end': 3217.167, 'text': 'The relative proportion of females to males based on class.', 'start': 3211.542, 'duration': 5.625}, {'end': 3223.413, 'text': 'Now notice that the ratios between males and females are actually far closer in first and second class than they are in third class.', 'start': 3217.267, 'duration': 6.146}, {'end': 3225.895, 'text': 'There are way more males in third class than there are females.', 'start': 3223.433, 'duration': 2.462}, {'end': 3227.056, 'text': 'Way more.', 'start': 3226.756, 'duration': 0.3}, {'end': 3233.466, 'text': 'This is potentially interesting and also something that may be worthy of further investigation as you drill into the data.', 'start': 3228.322, 'duration': 5.144}, {'end': 3235.187, 'text': 'But for our purposes.', 'start': 3234.086, 'duration': 1.101}], 'summary': 'First class males had highest survivability, with closer male-female ratios in first and second class than in third class.', 'duration': 44.388, 'max_score': 3190.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD43190799.jpg'}, {'end': 3290.019, 'src': 'embed', 'start': 3259.957, 'weight': 5, 'content': [{'end': 3267.381, 'text': 'but if we go back to our original hypothesis of women and children first, one of the things we also need to take a look at is age.', 'start': 3259.957, 'duration': 7.424}, {'end': 3274.756, 'text': 'Because gender is one aspect of women and children first, and age is the other aspect of women and children first.', 'start': 3268.578, 'duration': 6.178}, {'end': 3276.521, 'text': "So let's go ahead and take a look at that.", 'start': 3275.337, 'duration': 1.184}, {'end': 3290.019, 'text': 'Okay, so there are a number of visualizations that we can use in ggplot2 that specifically work really well for numeric data, for continuous data.', 'start': 3280.393, 'duration': 9.626}], 'summary': "Analyzing the 'women and children first' hypothesis, considering gender and age aspects using ggplot2 visualizations.", 'duration': 30.062, 'max_score': 3259.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD43259957.jpg'}], 'start': 2759.798, 'title': 'Data visualization and titanic passenger survival analysis', 'summary': 'Discusses the productivity benefits of using ggplot2 for efficient data visualization, allowing for quick creation of multiple visualizations and fine-grained control. it also explores the survival rates of titanic passengers based on class, gender, and age, revealing interesting insights such as the highest survival rates for first-class females and the lowest for third-class males.', 'chapters': [{'end': 2851.493, 'start': 2759.798, 'title': 'Efficient data visualization with ggplot2', 'summary': "Discusses the productivity benefits of using the copy and paste reuse technique in ggplot2, allowing for quick creation of multiple visualizations and the ability to achieve fine-grained control over visualizations compared to other tools, as explained by the speaker's personal experience with tools like tableau and power bi.", 'duration': 91.695, 'highlights': ["The copy and paste reuse technique in ggplot2 allows for quick creation of multiple visualizations, taking only seconds to run through a whole bunch of visualizations, as emphasized by the speaker's experience. (Relevance: 5)", 'R provides fine-grained control over visualizations through the grammar of graphics, enabling the speaker to create visualizations faster in R than in other visualization tools like Tableau and Power BI. (Relevance: 4)', "The speaker emphasizes the productivity benefits of using ggplot2, highlighting that it doesn't require special coding skills and can be achieved through practice. (Relevance: 3)"]}, {'end': 3276.521, 'start': 2851.633, 'title': 'Titanic passenger survival analysis', 'summary': 'Explores the survival rates of titanic passengers based on class and gender, revealing that first-class females had the highest survival rates, and third-class males had the lowest, with more males in third class than females. the analysis also considers the impact of age on survivability.', 'duration': 424.888, 'highlights': ['First-class females had the highest survival rates, while third-class males had the lowest. First-class females had over 90% survival rate, while third-class males had only about 25% survival rate.', 'More males in third class than females, with ratios between males and females being closer in first and second class. There were way more males in third class than females, and the ratios between males and females were closer in first and second class.', "The impact of age on survivability is considered as part of the 'women and children first' hypothesis. The analysis considers the impact of age on survivability as part of the 'women and children first' hypothesis."]}], 'duration': 516.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD42759798.jpg', 'highlights': ["The copy and paste reuse technique in ggplot2 allows for quick creation of multiple visualizations, taking only seconds to run through a whole bunch of visualizations, as emphasized by the speaker's experience.", 'R provides fine-grained control over visualizations through the grammar of graphics, enabling the speaker to create visualizations faster in R than in other visualization tools like Tableau and Power BI.', "The speaker emphasizes the productivity benefits of using ggplot2, highlighting that it doesn't require special coding skills and can be achieved through practice.", 'First-class females had the highest survival rates, while third-class males had the lowest. First-class females had over 90% survival rate, while third-class males had only about 25% survival rate.', 'More males in third class than females, with ratios between males and females being closer in first and second class. There were way more males in third class than females, and the ratios between males and females were closer in first and second class.', "The impact of age on survivability is considered as part of the 'women and children first' hypothesis. The analysis considers the impact of age on survivability as part of the 'women and children first' hypothesis."]}, {'end': 4264.063, 'segs': [{'end': 3305.97, 'src': 'embed', 'start': 3280.393, 'weight': 0, 'content': [{'end': 3290.019, 'text': 'Okay, so there are a number of visualizations that we can use in ggplot2 that specifically work really well for numeric data, for continuous data.', 'start': 3280.393, 'duration': 9.626}, {'end': 3292.241, 'text': 'First up is the histogram.', 'start': 3290.72, 'duration': 1.521}, {'end': 3299.746, 'text': "So, not surprisingly, one of the things we're probably going to want to know at the beginning, when we start looking at ages in the Titanic, is okay,", 'start': 3292.801, 'duration': 6.945}, {'end': 3305.97, 'text': 'what is the age distribution of passengers all up? And we use a histogram to actually understand that.', 'start': 3299.746, 'duration': 6.224}], 'summary': 'Using ggplot2, histograms are effective for visualizing numeric data distribution.', 'duration': 25.577, 'max_score': 3280.393, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD43280393.jpg'}, {'end': 3353.929, 'src': 'embed', 'start': 3320.788, 'weight': 3, 'content': [{'end': 3326.336, 'text': 'But the single most important thing that we need to see in this code is this part right here, which is bin width.', 'start': 3320.788, 'duration': 5.548}, {'end': 3335.06, 'text': "As you're aware, as I assume that you're aware, histograms, their shape, their visual shape,", 'start': 3328.677, 'duration': 6.383}, {'end': 3339.042, 'text': 'will actually change based on how you define the bin width.', 'start': 3335.06, 'duration': 3.982}, {'end': 3344.565, 'text': 'What this bin width says right now is, please bin my age data into blocks of five years.', 'start': 3339.342, 'duration': 5.223}, {'end': 3353.929, 'text': 'For example, block everybody from zero to five, six through 10, 11 through 15, 16 through 20.', 'start': 3344.945, 'duration': 8.984}], 'summary': 'The bin width determines the block sizes for age data in histograms.', 'duration': 33.141, 'max_score': 3320.788, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD43320788.jpg'}, {'end': 3449.127, 'src': 'embed', 'start': 3417.847, 'weight': 4, 'content': [{'end': 3422.63, 'text': "That's cool, right? So ggplot2 said, fine, I'll remove the 177 missing values.", 'start': 3417.847, 'duration': 4.783}, {'end': 3427.854, 'text': 'I will create you a histogram with the remaining values that I do have, and it also provides me a warning message.', 'start': 3422.79, 'duration': 5.064}, {'end': 3431.856, 'text': "That's important because later on, as a data scientist,", 'start': 3428.034, 'duration': 3.822}, {'end': 3438.4, 'text': 'I may actually have to do something about those ages either remove those rows or try to impute the missing ages.', 'start': 3431.856, 'duration': 6.544}, {'end': 3438.881, 'text': 'what have you?', 'start': 3438.4, 'duration': 0.481}, {'end': 3442.183, 'text': 'But for right now, for our purposes, we can just ignore that.', 'start': 3439.341, 'duration': 2.842}, {'end': 3449.127, 'text': 'Just notice that the ggplot2 will automatically remove, if necessary, missing values and create you the plot with the data that you do have.', 'start': 3442.283, 'duration': 6.844}], 'summary': 'Ggplot2 removed 177 missing values and created a histogram.', 'duration': 31.28, 'max_score': 3417.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD43417847.jpg'}, {'end': 4030.611, 'src': 'embed', 'start': 4004.187, 'weight': 1, 'content': [{'end': 4014.127, 'text': 'And if you notice the second class column here, that this green color, which is our teal color, which indicates survival Overwhelming,', 'start': 4004.187, 'duration': 9.94}, {'end': 4018.668, 'text': 'shows that the youngest children in both girls and boys survive in second class.', 'start': 4014.127, 'duration': 4.541}, {'end': 4022.849, 'text': 'It also shows that boys in first class disproportionately survived.', 'start': 4019.348, 'duration': 3.501}, {'end': 4030.611, 'text': 'And it also shows that if you were a boy in third class, you may have in some cases greater than a 50-50 chance not by much,', 'start': 4022.949, 'duration': 7.662}], 'summary': 'Teal color indicates overwhelming survival of youngest children in second class, with boys in first class disproportionately surviving.', 'duration': 26.424, 'max_score': 4004.187, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD44004187.jpg'}, {'end': 4204.722, 'src': 'embed', 'start': 4178.913, 'weight': 2, 'content': [{'end': 4183.676, 'text': 'And then looking at the rest of the boys, all the boys in second class survived, all the boys in first class survived.', 'start': 4178.913, 'duration': 4.763}, {'end': 4193.893, 'text': 'But also notice too that you can see here this is very pronounced, very pronounced that as you get older in third class.', 'start': 4186.107, 'duration': 7.786}, {'end': 4200.418, 'text': 'ooh, survival rates are really really bad, which tells us that look well.', 'start': 4193.893, 'duration': 6.525}, {'end': 4204.722, 'text': 'in general, survival rates for elderly males is pretty bad, all up.', 'start': 4200.418, 'duration': 4.304}], 'summary': 'All boys in second and first class survived, but elderly males in third class had very low survival rates.', 'duration': 25.809, 'max_score': 4178.913, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD44178912.jpg'}], 'start': 3280.393, 'title': 'Analyzing titanic data', 'summary': 'Covers ggplot2 visualizations for numeric data, focusing on the age distribution in the titanic dataset, highlighting missing values, disproportional survival rates, and the use of various visualization techniques including histograms, box and whisker plots, and density plots to gain insights into passenger demographics and survival rates.', 'chapters': [{'end': 3463.828, 'start': 3280.393, 'title': 'Ggplot2 visualizations for numeric data', 'summary': 'Introduces ggplot2 visualizations, focusing on the histogram for analyzing age distribution in the titanic dataset, emphasizing the importance of bin width in shaping the visual representation and highlighting the presence of 177 missing values in the age column.', 'duration': 183.435, 'highlights': ['The importance of defining bin width in a histogram to accurately represent the data is emphasized, with the demonstration of binning age data into blocks of five years, affecting the visual shape and providing insight into the age distribution.', 'The presence of 177 missing values in the age column is highlighted, with ggplot2 automatically removing them and providing a warning message, indicating the need for potential data handling or imputation in future analysis.']}, {'end': 3675.614, 'start': 3463.848, 'title': 'Titanic age distribution analysis', 'summary': 'Provides an analysis of the age distribution of passengers on the titanic, highlighting the prevalence of passengers aged 20 to 40, the disproportional survival rate of children, and the lower survivability of older passengers.', 'duration': 211.766, 'highlights': ['The age distribution on the Titanic shows a significant proportion of passengers falling within the 20-40 age range, indicating a concentration of individuals in this bracket.', 'Children, especially at the younger end of the age spectrum, had a disproportionately higher survival rate, with more than half surviving in certain age groups.', 'Elderly passengers, particularly those above 50, had notably lower survivability, with a clear decline in survival rates as age increased.', "The visualization provides insights into the relative proportion of those who perished versus those who survived at each age bucket, supporting the hypothesis of 'women and children first.'", 'The histogram analysis reveals that there were older passengers, including an 80-year-old, on the Titanic, albeit with lower survivability compared to younger age groups.']}, {'end': 3981.137, 'start': 3676.114, 'title': 'Data visualization techniques', 'summary': 'Explains the use of histograms, box and whisker plots, and density plots to visualize numeric data, emphasizing the importance of using multiple visualization tools to gain a thorough understanding of the data.', 'duration': 305.023, 'highlights': ['The box and whisker plot demonstrates that people who survived tended to be younger, reinforcing the idea that taking multiple looks at the data using various visualization tools is crucial for a thorough understanding.', 'The density plot, with its smoothed shapes and transparency settings, provides a user-friendly alternative to histograms for understanding data distribution, appealing to a wider audience and enabling the visualization of overlapping figures.']}, {'end': 4264.063, 'start': 3981.137, 'title': 'Titanic survival analysis', 'summary': 'Highlights the survival rates of passengers on the titanic based on age, gender, and passenger class, revealing that young children in second class had the highest survival rate, followed by boys in first class, and it also emphasizes the importance of using ggplot2 for data visualization.', 'duration': 282.926, 'highlights': ['The youngest children in second class had the highest survival rate, with all girls and boys in this category surviving, showcasing a significant impact of passenger class on survival. (Relevance score: 5)', 'Boys in first class also had a disproportionately high survival rate, underscoring the influence of both gender and passenger class on survival. (Relevance score: 4)', 'Boys in third class had slightly greater than a 50-50 chance of surviving, indicating the nuanced impact of age, gender, and passenger class on survival. (Relevance score: 3)', 'Girls in third class had notably lower survivability compared to boys in the same class, highlighting the combined influence of age, gender, and passenger class on survival. (Relevance score: 2)', 'The visualization tools, including density plots and histograms, effectively conveyed the survival patterns, emphasizing the utility of ggplot2 for data analysis and visualization. (Relevance score: 1)']}], 'duration': 983.67, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/49fADBfcDD4/pics/49fADBfcDD43280393.jpg', 'highlights': ['The visualization tools, including density plots and histograms, effectively conveyed the survival patterns, emphasizing the utility of ggplot2 for data analysis and visualization.', 'The youngest children in second class had the highest survival rate, with all girls and boys in this category surviving, showcasing a significant impact of passenger class on survival.', 'Boys in first class also had a disproportionately high survival rate, underscoring the influence of both gender and passenger class on survival.', 'The importance of defining bin width in a histogram to accurately represent the data is emphasized, with the demonstration of binning age data into blocks of five years, affecting the visual shape and providing insight into the age distribution.', 'The presence of 177 missing values in the age column is highlighted, with ggplot2 automatically removing them and providing a warning message, indicating the need for potential data handling or imputation in future analysis.']}], 'highlights': ["Dave Langer's extensive experience in training professionals through boot camps and YouTube tutorials demonstrates his expertise in the field.", "Dave Langer's experience in leading a team accountable for managing data systems for a $10 billion supply chain operations at Microsoft highlights his practical experience in handling large-scale data operations.", 'The Titanic dataset is extensively used due to its widespread familiarity, making it suitable for diverse audiences and tutorials.', 'The dataset serves as a good proxy for common business data, such as customer profile data, making it useful for predictive modeling.', 'Understanding the columns and their meanings in the data set is crucial for creating visualizations that answer specific business questions.', 'The importance of using ggplot2 for creating intuitive visualizations and its widespread usage in the data science community, demonstrated through real-world analogs such as customer churn, fraud detection, and understanding conversion rates.', 'The necessity of data for creating a ggplot2 visualization is emphasized, constituting a fundamental requirement. 80% of the time', 'The significance of aesthetics in mapping data to the visualization is highlighted, emphasizing the need for code-based aesthetic mappings as a requisite for ggplot2.', 'The necessity of layers for defining the visualization is explained, stressing the need for providing specific functions like geom functions to render the visualization as intended. dozens of functions', 'The Titanic dataset contains 891 observations of 12 columns of data, making it a valuable resource for educational purposes at Data Science Dojo.', 'About 62% of the passengers unfortunately perished on the Titanic, and about 38% survived. Demonstrates the key insight into the survival rates of the Titanic data set.', 'Females survived disproportionately more than males did on the Titanic.', "The copy and paste reuse technique in ggplot2 allows for quick creation of multiple visualizations, taking only seconds to run through a whole bunch of visualizations, as emphasized by the speaker's experience.", 'R provides fine-grained control over visualizations through the grammar of graphics, enabling the speaker to create visualizations faster in R than in other visualization tools like Tableau and Power BI.', 'The visualization tools, including density plots and histograms, effectively conveyed the survival patterns, emphasizing the utility of ggplot2 for data analysis and visualization.', 'The youngest children in second class had the highest survival rate, with all girls and boys in this category surviving, showcasing a significant impact of passenger class on survival.']}