title

Learn Data Science Tutorial - Full Course for Beginners

description

Learn Data Science is this full tutorial course for absolute beginners. Data science is considered the "sexiest job of the 21st century." You'll learn the important elements of data science. You'll be introduced to the principles, practices, and tools that make data science the powerful medium for critical insight in business and research. You'll have a solid foundation for future learning and applications in your work. With data science, you can do what you want to do, and do it better. This course covers the foundations of data science, data sourcing, coding, mathematics, and statistics.
💻 Course created by Barton Poulson from datalab.cc.
🔗 Check out the datalab.cc YouTube channel: https://www.youtube.com/user/datalabcc
🔗 Watch more free data science courses at http://datalab.cc/
⭐️ Course Contents ⭐️
⌨️ Part 1: Data Science: An Introduction: Foundations of Data Science
- Welcome (1.1)
- Demand for Data Science (2.1)
- The Data Science Venn Diagram (2.2)
- The Data Science Pathway (2.3)
- Roles in Data Science (2.4)
- Teams in Data Science (2.5)
- Big Data (3.1)
- Coding (3.2)
- Statistics (3.3)
- Business Intelligence (3.4)
- Do No Harm (4.1)
- Methods Overview (5.1)
- Sourcing Overview (5.2)
- Coding Overview (5.3)
- Math Overview (5.4)
- Statistics Overview (5.5)
- Machine Learning Overview (5.6)
- Interpretability (6.1)
- Actionable Insights (6.2)
- Presentation Graphics (6.3)
- Reproducible Research (6.4)
- Next Steps (7.1)
⌨️ Part 2: Data Sourcing: Foundations of Data Science (1:39:46)
- Welcome (1.1)
- Metrics (2.1)
- Accuracy (2.2)
- Social Context of Measurement (2.3)
- Existing Data (3.1)
- APIs (3.2)
- Scraping (3.3)
- New Data (4.1)
- Interviews (4.2)
- Surveys (4.3)
- Card Sorting (4.4)
- Lab Experiments (4.5)
- A/B Testing (4.6)
- Next Steps (5.1)
⌨️ Part 3: Coding (2:32:42)
- Welcome (1.1)
- Spreadsheets (2.1)
- Tableau Public (2.2)
- SPSS (2.3)
- JASP (2.4)
- Other Software (2.5)
- HTML (3.1)
- XML (3.2)
- JSON (3.3)
- R (4.1)
- Python (4.2)
- SQL (4.3)
- C, C++, & Java (4.4)
- Bash (4.5)
- Regex (5.1)
- Next Steps (6.1)
⌨️ Part 4: Mathematics (4:01:09)
- Welcome (1.1)
- Elementary Algebra (2.1)
- Linear Algebra (2.2)
- Systems of Linear Equations (2.3)
- Calculus (2.4)
- Calculus & Optimization (2.5)
- Big O (3.1)
- Probability (3.2)
⌨️ Part 5: Statistics (4:44:03)
- Welcome (1.1)
- Exploration Overview (2.1)
- Exploratory Graphics (2.2)
- Exploratory Statistics (2.3)
- Descriptive Statistics (2.4)
- Inferential Statistics (3.1)
- Hypothesis Testing (3.2)
- Estimation (3.3)
- Estimators (4.1)
- Measures of Fit (4.2)
- Feature Selection (4.3)
- Problems in Modeling (4.4)
- Model Validation (4.5)
- DIY (4.6)
- Next Step (5.1)
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://www.freecodecamp.org/news

detail

{'title': 'Learn Data Science Tutorial - Full Course for Beginners', 'heatmap': [{'end': 1057.959, 'start': 633.552, 'weight': 0.838}, {'end': 3170.959, 'start': 2951.699, 'weight': 0.749}, {'end': 12677.64, 'start': 12466.178, 'weight': 1}], 'summary': 'The tutorial covers an introduction to data science, data science fundamentals including career prospects, effective data presentation, best practices, accuracy and sourcing, collection and analysis techniques, tools and techniques, statistical analysis software, working with html, xml, json, python, jupyter, sql, c/c++, java, mathematics, probability and statistics in decision making, multivariate distributions analysis, and statistics and data analysis, presenting various concepts, methods, and tools in detail.', 'chapters': [{'end': 106.661, 'segs': [{'end': 85.484, 'src': 'embed', 'start': 43.252, 'weight': 0, 'content': [{'end': 49.913, 'text': 'the important thing is that data science is not so much a technical discipline, but creative.', 'start': 43.252, 'duration': 6.661}, {'end': 51.955, 'text': "And really, that's true.", 'start': 50.593, 'duration': 1.362}, {'end': 60.682, 'text': 'The reason I say that is because in data science, you use tools that come from coding and statistics and from math,', 'start': 52.235, 'duration': 8.447}, {'end': 64.105, 'text': 'but you use those to work creatively with data.', 'start': 60.682, 'duration': 3.423}, {'end': 74.054, 'text': "The idea is that there's always more than one way to solve a problem or answer a question And, most importantly, to get insight, because the goal,", 'start': 64.766, 'duration': 9.288}, {'end': 77.437, 'text': 'no matter how you go about it, is to get insight from your data.', 'start': 74.054, 'duration': 3.383}, {'end': 85.484, 'text': 'And what makes data science unique compared to so many other things, is that you try to listen to all of your data,', 'start': 77.897, 'duration': 7.587}], 'summary': 'Data science blends technical tools with creativity to gain insights from data.', 'duration': 42.232, 'max_score': 43.252, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3043252.jpg'}], 'start': 2.907, 'title': 'Introduction to data science', 'summary': 'Provides a non-technical overview of data science, emphasizing its creative nature and the importance of gaining insight from data, highlighting the inclusive approach to analysis and problem-solving.', 'chapters': [{'end': 106.661, 'start': 2.907, 'title': 'Introduction to data science', 'summary': 'Provides a non-technical overview of data science, emphasizing its creative nature and the importance of gaining insight from data, highlighting the inclusive approach to analysis and problem-solving.', 'duration': 103.754, 'highlights': ['Data science emphasizes creativity and the use of tools from coding, statistics, and math to gain insight from data, promoting inclusivity in analysis and problem-solving.', 'The technical aspects of data science are not as important as the creative use of tools to work with data, promoting multiple problem-solving approaches and inclusive analysis.', "Data science aims to extract insight from data, emphasizing the importance of listening to all data, even when it doesn't fit standard approaches, to gain a comprehensive understanding of the subject."]}], 'duration': 103.754, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj302907.jpg', 'highlights': ['Data science emphasizes creativity and the use of tools from coding, statistics, and math to gain insight from data, promoting inclusivity in analysis and problem-solving.', "Data science aims to extract insight from data, emphasizing the importance of listening to all data, even when it doesn't fit standard approaches, to gain a comprehensive understanding of the subject.", 'The technical aspects of data science are not as important as the creative use of tools to work with data, promoting multiple problem-solving approaches and inclusive analysis.']}, {'end': 4438.229, 'segs': [{'end': 332.54, 'src': 'embed', 'start': 297.346, 'weight': 0, 'content': [{'end': 300.788, 'text': 'So this means actual practicing data scientists.', 'start': 297.346, 'duration': 3.442}, {'end': 307.877, 'text': "That's a huge number, but almost 10 times as high as 1.5 million.", 'start': 301.388, 'duration': 6.489}, {'end': 313.243, 'text': 'more data savvy managers will be needed to take full advantage of big data in the United States.', 'start': 307.877, 'duration': 5.366}, {'end': 319.612, 'text': "Now, that's people who aren't necessarily doing the analysis but have to understand it, who have to speak data.", 'start': 314.165, 'duration': 5.447}, {'end': 332.54, 'text': "And that's one of the main purposes of this particular course is to help people who may or may not be the practicing data scientists learn to understand what they can get out of data and some of the methods used to get there.", 'start': 320.172, 'duration': 12.368}], 'summary': '1.5 million practicing data scientists; need for more data-savvy managers to leverage big data in the us.', 'duration': 35.194, 'max_score': 297.346, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj30297346.jpg'}, {'end': 363.785, 'src': 'embed', 'start': 338.144, 'weight': 2, 'content': [{'end': 343.888, 'text': 'And that will bring you to this webpage, the 25 hottest job skills that got people hired in 2014.', 'start': 338.144, 'duration': 5.744}, {'end': 349.632, 'text': 'And take a look at number one here, statistical analysis and data mining, very closely related to data science.', 'start': 343.888, 'duration': 5.744}, {'end': 361.463, 'text': 'And just to be clear, this was number one in Australia, in Brazil, in Canada, in France, in India, in the Netherlands, in South Africa,', 'start': 350.132, 'duration': 11.331}, {'end': 363.785, 'text': 'in the United Arab Emirates, in the United Kingdom.', 'start': 361.463, 'duration': 2.322}], 'summary': 'Statistical analysis and data mining was the top job skill in various countries in 2014.', 'duration': 25.641, 'max_score': 338.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj30338144.jpg'}, {'end': 441.853, 'src': 'embed', 'start': 411.252, 'weight': 1, 'content': [{'end': 414.754, 'text': 'We have physicians or doctors, dentists and lawyers and so on.', 'start': 411.252, 'duration': 3.502}, {'end': 422.638, 'text': "Now if we add data scientists to this list using data from O'Reilly.com, we have to push things around the site.", 'start': 415.455, 'duration': 7.183}, {'end': 429.662, 'text': 'And it goes in third with an average total salary, not the base that we had in the other one, but the total compensation.', 'start': 423.238, 'duration': 6.424}, {'end': 432.724, 'text': 'of about $144, 000 a year.', 'start': 430.402, 'duration': 2.322}, {'end': 434.305, 'text': "That's extraordinary.", 'start': 433.364, 'duration': 0.941}, {'end': 441.853, 'text': 'So, in sum, what do we get from all of this? First off, we learned that there is a very high demand for data science.', 'start': 435.266, 'duration': 6.587}], 'summary': 'Data scientists have an average total compensation of about $144,000 a year, indicating high demand for data science.', 'duration': 30.601, 'max_score': 411.252, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj30411252.jpg'}, {'end': 1057.959, 'src': 'heatmap', 'start': 633.552, 'weight': 0.838, 'content': [{'end': 640.575, 'text': "And the reason you need the math is because that's going to help you choose the appropriate procedures to answer the question with the data that you have.", 'start': 633.552, 'duration': 7.023}, {'end': 645.917, 'text': "And probably even more importantly, it's going to help you diagnose problems when things don't go as expected.", 'start': 641.255, 'duration': 4.662}, {'end': 651.54, 'text': "And given that you're trying to do new things with new data in new ways, you're probably going to come across problems.", 'start': 646.517, 'duration': 5.023}, {'end': 656.943, 'text': "And so the ability to understand the mechanics of what's going on is going to give you a big advantage.", 'start': 651.58, 'duration': 5.363}, {'end': 662.205, 'text': 'And the third element of the data science Venn diagram is some sort of domain expertise.', 'start': 657.923, 'duration': 4.282}, {'end': 664.947, 'text': "Think of it as expertise in the field that you're in.", 'start': 662.666, 'duration': 2.281}, {'end': 667.028, 'text': 'Business settings are common.', 'start': 665.607, 'duration': 1.421}, {'end': 672.931, 'text': 'You need to know about the goals of that field, the methods that are used, the constraints that people come across.', 'start': 667.588, 'duration': 5.343}, {'end': 678.135, 'text': "And it's important because whatever your results are, you need to be able to implement them well.", 'start': 673.711, 'duration': 4.424}, {'end': 682.197, 'text': "Data science is very practical, and it's designed to accomplish something.", 'start': 678.675, 'duration': 3.522}, {'end': 692.505, 'text': 'And your familiarity with a particular field of practice is going to make it that much easier and more impactful when you implement the results of your analysis.', 'start': 682.698, 'duration': 9.807}, {'end': 696.514, 'text': "Now let's go back to our Venn diagram here just for a moment.", 'start': 693.993, 'duration': 2.521}, {'end': 700.715, 'text': 'Because this is a Venn, we also have these intersections of two circles at a time.', 'start': 697.094, 'duration': 3.621}, {'end': 709.598, 'text': 'At the top is machine learning, at the bottom right is traditional research, and on the bottom left is what Drew Conway called the danger zone.', 'start': 701.395, 'duration': 8.203}, {'end': 710.898, 'text': 'Let me talk about each of these.', 'start': 709.638, 'duration': 1.26}, {'end': 714.21, 'text': 'First off, machine learning or ML.', 'start': 712.329, 'duration': 1.881}, {'end': 717.453, 'text': 'Now you think about machine learning,', 'start': 714.731, 'duration': 2.722}, {'end': 724.599, 'text': 'and the idea here is that it represents coding or statistical programming and mathematics without any real domain expertise.', 'start': 717.453, 'duration': 7.146}, {'end': 727.561, 'text': 'Sometimes these are referred to as black box models.', 'start': 725.139, 'duration': 2.422}, {'end': 733.005, 'text': "They kind of throw data in and you don't even necessarily have to know what it means or what language it's in,", 'start': 727.941, 'duration': 5.064}, {'end': 736.028, 'text': "and it'll just kind of crunch through it all and it'll give you some regularities.", 'start': 733.005, 'duration': 3.023}, {'end': 738.629, 'text': 'That can be very helpful,', 'start': 736.828, 'duration': 1.801}, {'end': 747.053, 'text': "but machine learning is considered slightly different from data science because it doesn't involve the particular applications in a specific domain.", 'start': 738.629, 'duration': 8.424}, {'end': 751.146, 'text': "Also, there's traditional research.", 'start': 748.965, 'duration': 2.181}, {'end': 758.649, 'text': 'This is where you have math or statistics and you have domain knowledge, often very intensive domain knowledge, but without the coding or programming.', 'start': 751.186, 'duration': 7.463}, {'end': 764.131, 'text': 'Now, you can get away with that because the data that you use in traditional research is highly structured.', 'start': 759.129, 'duration': 5.002}, {'end': 768.693, 'text': 'It comes in rows and columns, is typically complete, and is typically ready for analysis.', 'start': 764.191, 'duration': 4.502}, {'end': 775.576, 'text': "Doesn't mean your life is easy, because now you have to expend an enormous amount of effort in the method,", 'start': 769.293, 'duration': 6.283}, {'end': 780.358, 'text': 'in designing the project and in the interpretation of the data.', 'start': 775.576, 'duration': 4.782}, {'end': 786.441, 'text': 'so still very heavy intellectual, cognitive work, but it comes in a different place.', 'start': 780.358, 'duration': 6.083}, {'end': 794.225, 'text': "and then finally there's what conway called the danger zone, and that's the intersection of coding and domain knowledge,", 'start': 786.441, 'duration': 7.784}, {'end': 795.726, 'text': 'but without math or statistics.', 'start': 794.225, 'duration': 1.501}, {'end': 798.908, 'text': "now he says it's unlikely to happen, and that's probably true.", 'start': 795.726, 'duration': 3.182}, {'end': 802.749, 'text': 'On the other hand, I can think of some common examples, what are called word counts,', 'start': 799.708, 'duration': 3.041}, {'end': 807.931, 'text': 'where you take a large document or a series of documents and you count how often each word appears in there.', 'start': 802.749, 'duration': 5.182}, {'end': 809.932, 'text': 'That can actually tell you some important things.', 'start': 808.311, 'duration': 1.621}, {'end': 814.794, 'text': 'And also drawing maps and showing how things change across place and maybe across time.', 'start': 810.332, 'duration': 4.462}, {'end': 820.076, 'text': "You don't necessarily have to have the math, but it can be very insightful and helpful.", 'start': 815.714, 'duration': 4.362}, {'end': 825.768, 'text': "So, let's think about a couple of backgrounds where people come from here.", 'start': 821.866, 'duration': 3.902}, {'end': 826.968, 'text': 'First is coding.', 'start': 825.988, 'duration': 0.98}, {'end': 831.49, 'text': 'You can have people who are coders, who can do math, stats, and business.', 'start': 827.869, 'duration': 3.621}, {'end': 835.232, 'text': 'So, you get the three things, and this is probably the most common.', 'start': 832.03, 'duration': 3.202}, {'end': 837.713, 'text': 'Most of the people come from a programming background.', 'start': 835.292, 'duration': 2.421}, {'end': 845.699, 'text': "on the other hand, there's also stats or statistics, and you can get statisticians who can code and who also can do business.", 'start': 838.653, 'duration': 7.046}, {'end': 847.92, 'text': "that's less common, but it does happen.", 'start': 845.699, 'duration': 2.221}, {'end': 852.184, 'text': "and finally, there's people who come into data science from a particular domain.", 'start': 847.92, 'duration': 4.264}, {'end': 857.047, 'text': "these are, for instance, business people who can code and do numbers, and they're the least common.", 'start': 852.184, 'duration': 4.863}, {'end': 861.471, 'text': 'but all of these are important to data science.', 'start': 857.047, 'duration': 4.424}, {'end': 864.613, 'text': "and so, in sum, here's what we can take away.", 'start': 861.471, 'duration': 3.142}, {'end': 867.675, 'text': 'first, several fields make up data science.', 'start': 864.613, 'duration': 3.062}, {'end': 873.963, 'text': "Second, diverse skills and backgrounds are important and they're needed in data science.", 'start': 868.877, 'duration': 5.086}, {'end': 878.889, 'text': "And third, there are many roles involved because there's a lot of different things that need to happen.", 'start': 874.764, 'duration': 4.125}, {'end': 881.553, 'text': "We'll say more about that in our next movie.", 'start': 879.19, 'duration': 2.363}, {'end': 893.234, 'text': 'The next step in our data science introduction and our definition of data science is to talk about the data science pathway.', 'start': 885.748, 'duration': 7.486}, {'end': 901.16, 'text': "So I like to think of this as when you're working on a major project, you got to do one step at a time to get from here to there.", 'start': 894.395, 'duration': 6.765}, {'end': 906.305, 'text': 'In data science, you can take the various steps and you can put them into a couple of general categories.', 'start': 902.061, 'duration': 4.244}, {'end': 909.688, 'text': 'First, there are the steps that involve planning.', 'start': 907.005, 'duration': 2.683}, {'end': 912.27, 'text': "Second, there's the data prep.", 'start': 910.228, 'duration': 2.042}, {'end': 915.373, 'text': "Third, there's the actual modeling of the data.", 'start': 912.79, 'duration': 2.583}, {'end': 917.635, 'text': "And fourth, there's the follow-up.", 'start': 915.813, 'duration': 1.822}, {'end': 920.397, 'text': 'And there are several steps within each of these.', 'start': 918.115, 'duration': 2.282}, {'end': 922.079, 'text': "I'll explain each of them briefly.", 'start': 920.738, 'duration': 1.341}, {'end': 925.088, 'text': "First, let's talk about planning.", 'start': 923.607, 'duration': 1.481}, {'end': 933.214, 'text': "The first thing you need to do is you need to define the goals of your project so you know how to use your resources well and also so you know when you're done.", 'start': 925.649, 'duration': 7.565}, {'end': 942.241, 'text': 'Second, you need to organize your resources so you might have data from several different sources, you might have different software packages,', 'start': 933.995, 'duration': 8.246}, {'end': 945.043, 'text': 'you might have different people, which gets us to the third one.', 'start': 942.241, 'duration': 2.802}, {'end': 948.526, 'text': 'You need to coordinate the people so they can work together productively.', 'start': 945.123, 'duration': 3.403}, {'end': 954.41, 'text': "If you're doing a handoff, It needs to be clear who's going to do what, and how their work is going to go together.", 'start': 948.566, 'duration': 5.844}, {'end': 960.756, 'text': 'And then really to state the obvious, you need to schedule the project so things can move along smoothly.', 'start': 955.031, 'duration': 5.725}, {'end': 962.417, 'text': 'you can finish in a reasonable amount of time.', 'start': 960.756, 'duration': 1.661}, {'end': 969.146, 'text': "Next is the data prep where you're taking like food prep and getting the raw ingredients ready.", 'start': 964.602, 'duration': 4.544}, {'end': 975.531, 'text': 'First, of course, is you need to get the data and it can come from many different sources and be in many different formats.', 'start': 969.826, 'duration': 5.705}, {'end': 977.893, 'text': 'You need to clean the data.', 'start': 976.431, 'duration': 1.462}, {'end': 984.338, 'text': 'And the sad thing is this tends to be a very large part of any data science project.', 'start': 978.553, 'duration': 5.785}, {'end': 988.441, 'text': "And that's because you're bringing in unusual data from a lot of different places.", 'start': 984.738, 'duration': 3.703}, {'end': 996.413, 'text': 'You also want to explore the data, that is, really see what it looks like, how many people are in each group,', 'start': 990.017, 'duration': 6.396}, {'end': 1002.173, 'text': "what the shape of the distributions are like, what's associated with what, And you may need to refine the data,", 'start': 996.413, 'duration': 5.76}, {'end': 1010.036, 'text': 'and that means choosing variables to include, choosing cases to include or exclude, making any transformations to the data you need to do.', 'start': 1002.173, 'duration': 7.863}, {'end': 1014.198, 'text': 'And of course these steps kind of can bounce back and forth from one to the other.', 'start': 1010.416, 'duration': 3.782}, {'end': 1019.22, 'text': 'The third group is modeling or statistical modeling.', 'start': 1015.899, 'duration': 3.321}, {'end': 1022.422, 'text': 'This is where you actually want to create the statistical model.', 'start': 1019.28, 'duration': 3.142}, {'end': 1026.924, 'text': 'So for instance you might do a regression analysis or you might do a neural network.', 'start': 1022.862, 'duration': 4.062}, {'end': 1031.886, 'text': 'But whatever you do, once you create your model, you have to validate the model.', 'start': 1028.064, 'duration': 3.822}, {'end': 1039.429, 'text': 'You might do that with a holdout validation, you might do it really with a very small replication if you can.', 'start': 1032.445, 'duration': 6.984}, {'end': 1042.431, 'text': 'You also need to evaluate the model.', 'start': 1040.69, 'duration': 1.741}, {'end': 1048.434, 'text': 'So, once you know that the model is accurate, what does it actually mean and how much does it tell you?', 'start': 1042.991, 'duration': 5.443}, {'end': 1052.716, 'text': 'And then, finally, you need to refine the model.', 'start': 1049.554, 'duration': 3.162}, {'end': 1055.838, 'text': 'So, for instance, there may be variables you want to throw out.', 'start': 1052.776, 'duration': 3.062}, {'end': 1057.959, 'text': 'there may be additional ones you want to include.', 'start': 1055.838, 'duration': 2.121}], 'summary': 'Data science involves math, domain expertise, and diverse skills. the pathway includes planning, data prep, modeling, and follow-up.', 'duration': 424.407, 'max_score': 633.552, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj30633552.jpg'}, {'end': 1150.244, 'src': 'embed', 'start': 1125.971, 'weight': 3, 'content': [{'end': 1132.794, 'text': 'document what you have and make it possible for you or for others to repeat the analysis or develop off of it in the future.', 'start': 1125.971, 'duration': 6.823}, {'end': 1138.517, 'text': 'Those are the general steps of what I consider the data science pathway.', 'start': 1134.335, 'duration': 4.182}, {'end': 1141.499, 'text': 'And in sum, what we get from this is three things.', 'start': 1139.038, 'duration': 2.461}, {'end': 1145.821, 'text': "First, data science isn't just a technical field, it's not just coding.", 'start': 1142.019, 'duration': 3.802}, {'end': 1150.244, 'text': 'Things like planning and presenting and implementing are just as important.', 'start': 1146.281, 'duration': 3.963}], 'summary': 'Document data for reproducible analysis. data science involves more than just coding, includes planning, presenting, and implementing.', 'duration': 24.273, 'max_score': 1125.971, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj301125971.jpg'}, {'end': 1217.09, 'src': 'embed', 'start': 1189.945, 'weight': 4, 'content': [{'end': 1194.606, 'text': "So let's talk about some of the roles involved in data science and how they contribute to the projects.", 'start': 1189.945, 'duration': 4.661}, {'end': 1197.567, 'text': "First off, let's take a look at engineers.", 'start': 1195.526, 'duration': 2.041}, {'end': 1204.435, 'text': 'These are people who focus on the backend hardware, for instance the servers and the software that runs them.', 'start': 1198.147, 'duration': 6.288}, {'end': 1213.526, 'text': 'This is what makes data science possible, and it includes people like developers, software developers or database administrators,', 'start': 1205.236, 'duration': 8.29}, {'end': 1217.09, 'text': 'and they provide the foundation for the rest of the work.', 'start': 1213.526, 'duration': 3.564}], 'summary': 'Data science roles like engineers focus on backend hardware and software to make data science possible.', 'duration': 27.145, 'max_score': 1189.945, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj301189945.jpg'}, {'end': 1912.5, 'src': 'embed', 'start': 1886.119, 'weight': 5, 'content': [{'end': 1890.742, 'text': "You're going to need coding and statistics and math and you're going to have to have domain expertise.", 'start': 1886.119, 'duration': 4.623}, {'end': 1894.184, 'text': "primarily because of the variety you're dealing with.", 'start': 1891.622, 'duration': 2.562}, {'end': 1897.547, 'text': 'But taken all together, you do have to have all of it.', 'start': 1894.625, 'duration': 2.922}, {'end': 1900.009, 'text': "So in sum, here's what we get.", 'start': 1898.148, 'duration': 1.861}, {'end': 1904.733, 'text': 'Big data is not equal to is not identical to data science.', 'start': 1900.79, 'duration': 3.943}, {'end': 1906.515, 'text': "Now there's common ground.", 'start': 1905.334, 'duration': 1.181}, {'end': 1910.478, 'text': 'And a lot of people who are good at big data are good at data science and vice versa.', 'start': 1906.975, 'duration': 3.503}, {'end': 1912.5, 'text': 'But they are conceptually distinct.', 'start': 1910.938, 'duration': 1.562}], 'summary': 'Big data and data science require coding, statistics, math, and domain expertise. they are conceptually distinct but share common ground.', 'duration': 26.381, 'max_score': 1886.119, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj301886119.jpg'}, {'end': 2748.45, 'src': 'embed', 'start': 2720.928, 'weight': 6, 'content': [{'end': 2724.01, 'text': 'Next in our very ominous picture is data security.', 'start': 2720.928, 'duration': 3.082}, {'end': 2731.036, 'text': 'And the idea here is that when you go through all the effort to gather data, to clean up and prepare for an analysis,', 'start': 2724.831, 'duration': 6.205}, {'end': 2733.678, 'text': "you've created something that's very valuable to a lot of people.", 'start': 2731.036, 'duration': 2.642}, {'end': 2741.924, 'text': 'And you have to be concerned about hackers trying to come in and steal the data, especially if the data is not anonymous and it has identifiers in it.', 'start': 2734.178, 'duration': 7.746}, {'end': 2748.45, 'text': 'And so there is an additional burden placed on the analyst to ensure, to the best of their ability,', 'start': 2742.725, 'duration': 5.725}], 'summary': 'Data security is crucial when handling valuable and identifiable data, with analysts facing the burden of safeguarding it against potential hackers.', 'duration': 27.522, 'max_score': 2720.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj302720928.jpg'}, {'end': 3170.959, 'src': 'heatmap', 'start': 2951.699, 'weight': 0.749, 'content': [{'end': 2958.862, 'text': 'And so we want to focus primarily on insight and the tools and the tech as they serve to further that goal.', 'start': 2951.699, 'duration': 7.163}, {'end': 2965.379, 'text': "Now, there's a few general categories we're going to talk about, again, with an overview for each of these.", 'start': 2960.154, 'duration': 5.225}, {'end': 2968.483, 'text': 'The first one is sourcing, or data sourcing.', 'start': 2966.16, 'duration': 2.323}, {'end': 2973.408, 'text': 'That is, how to get the data that goes into data science, the raw materials that you need.', 'start': 2968.643, 'duration': 4.765}, {'end': 2975.37, 'text': 'The second is coding.', 'start': 2974.148, 'duration': 1.222}, {'end': 2981.416, 'text': 'That, again, is computer programming that can be used to obtain and manipulate and analyze the data.', 'start': 2975.71, 'duration': 5.706}, {'end': 2990.386, 'text': 'After that, a tiny bit of math, and that is the mathematics behind data science methods that really form the foundations of the procedures.', 'start': 2982.859, 'duration': 7.527}, {'end': 3000.235, 'text': 'And then stats, the statistical methods that are frequently used to summarize and analyze data, especially as applied to data science.', 'start': 2992.588, 'duration': 7.647}, {'end': 3003.819, 'text': "And then there's Machine Learning ML.", 'start': 3001.618, 'duration': 2.201}, {'end': 3011.444, 'text': 'This is a collection of methods for finding clusters in the data for predicting categories or scores on interesting outcomes.', 'start': 3004.54, 'duration': 6.904}, {'end': 3019.388, 'text': "And even across these five things, even then, the presentations aren't too techy-crunchy.", 'start': 3012.464, 'duration': 6.924}, {'end': 3022.991, 'text': "They're basically still friendly, and really, that's the way it is.", 'start': 3019.428, 'duration': 3.563}, {'end': 3027.214, 'text': 'And so that is the overview of the overviews.', 'start': 3023.991, 'duration': 3.223}, {'end': 3036.544, 'text': "In sum, we need to remember that data science includes tech, but data science is greater than tech, it's more than those procedures.", 'start': 3027.815, 'duration': 8.729}, {'end': 3045.253, 'text': 'And above all that tech well important to data science is still simply a means to insight in data.', 'start': 3037.345, 'duration': 7.908}, {'end': 3055.704, 'text': "The first step in discussing data science methods is to look at the methods of sourcing or getting data that's used in data science.", 'start': 3047.981, 'duration': 7.723}, {'end': 3060.907, 'text': 'You can think of this as getting the raw materials that go into your analyses.', 'start': 3056.625, 'duration': 4.282}, {'end': 3064.308, 'text': "Now you've got a few different choices when it comes to this.", 'start': 3061.447, 'duration': 2.861}, {'end': 3073.412, 'text': "in data science, you can use existing data, you can use something called data API's, you can scrape web data or you can make data.", 'start': 3064.308, 'duration': 9.104}, {'end': 3078.315, 'text': "We'll talk about each of those very briefly in a non technical manner.", 'start': 3073.792, 'duration': 4.523}, {'end': 3081.598, 'text': 'For right now, let me say something about existing data.', 'start': 3079.036, 'duration': 2.562}, {'end': 3086.362, 'text': 'This is data that already is at hand, and it might be in house data.', 'start': 3081.998, 'duration': 4.364}, {'end': 3089.724, 'text': 'So if you work for a company, it might be your company records.', 'start': 3086.382, 'duration': 3.342}, {'end': 3098.09, 'text': 'Or you might have open data, for instance, many governments, many scientific organizations make their data available to the public.', 'start': 3090.465, 'duration': 7.625}, {'end': 3101.455, 'text': "And then there's also third party data.", 'start': 3099.091, 'duration': 2.364}, {'end': 3107.584, 'text': "This is usually data that you buy from a vendor, but it exists and it's very easy to plug it in and go.", 'start': 3101.515, 'duration': 6.069}, {'end': 3114.119, 'text': "You can also use API's now that stands for application programming interface.", 'start': 3109.315, 'duration': 4.804}, {'end': 3120.324, 'text': 'And this is something that allows various computer applications to communicate directly with each other.', 'start': 3114.659, 'duration': 5.665}, {'end': 3122.866, 'text': "It's like phones for your computer programs.", 'start': 3120.344, 'duration': 2.522}, {'end': 3125.948, 'text': "It's the most common way of getting web data.", 'start': 3123.787, 'duration': 2.161}, {'end': 3133.294, 'text': "And the beautiful thing about it is it allows you to import that data directly into whatever program or application you're using to analyze the data.", 'start': 3126.068, 'duration': 7.226}, {'end': 3136.477, 'text': 'Next is scraping data.', 'start': 3134.715, 'duration': 1.762}, {'end': 3142.321, 'text': "And this is where you want to use data that's on the web, but they don't have an existing API.", 'start': 3136.837, 'duration': 5.484}, {'end': 3150.807, 'text': "And what that means is usually data that's in HTML web tables and pages, maybe PDFs.", 'start': 3142.962, 'duration': 7.845}, {'end': 3156.232, 'text': 'And you can do this either with using specialized applications for scraping data.', 'start': 3151.008, 'duration': 5.224}, {'end': 3162.136, 'text': 'Or you can do it in a programming language like our Python and write the code to do the data scraping.', 'start': 3156.932, 'duration': 5.204}, {'end': 3170.959, 'text': 'Or another option is to make data and this lets you get exactly what you need, you can be very specific.', 'start': 3163.533, 'duration': 7.426}], 'summary': 'Data science overview covers sourcing, coding, math, stats, and machine learning. various options for data sourcing including existing data, apis, web scraping, and creating data.', 'duration': 219.26, 'max_score': 2951.699, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj302951699.jpg'}, {'end': 3036.544, 'src': 'embed', 'start': 3004.54, 'weight': 10, 'content': [{'end': 3011.444, 'text': 'This is a collection of methods for finding clusters in the data for predicting categories or scores on interesting outcomes.', 'start': 3004.54, 'duration': 6.904}, {'end': 3019.388, 'text': "And even across these five things, even then, the presentations aren't too techy-crunchy.", 'start': 3012.464, 'duration': 6.924}, {'end': 3022.991, 'text': "They're basically still friendly, and really, that's the way it is.", 'start': 3019.428, 'duration': 3.563}, {'end': 3027.214, 'text': 'And so that is the overview of the overviews.', 'start': 3023.991, 'duration': 3.223}, {'end': 3036.544, 'text': "In sum, we need to remember that data science includes tech, but data science is greater than tech, it's more than those procedures.", 'start': 3027.815, 'duration': 8.729}], 'summary': 'Methods for finding clusters in data, emphasizing that data science is more than just tech.', 'duration': 32.004, 'max_score': 3004.54, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj303004540.jpg'}, {'end': 3425.119, 'src': 'embed', 'start': 3398.001, 'weight': 7, 'content': [{'end': 3403.645, 'text': "R is probably the most common, along with Python, a general purpose language, but it's been well adapted for data use.", 'start': 3398.001, 'duration': 5.644}, {'end': 3412.07, 'text': "There's SQL, the structured query language for databases, and very basic languages like C and C++ and Java,", 'start': 3404.485, 'duration': 7.585}, {'end': 3414.632, 'text': 'which are used more in the back end of data science.', 'start': 3412.07, 'duration': 2.562}, {'end': 3421.216, 'text': "And then there's Bash, the most common command line interface, and regular expressions.", 'start': 3415.833, 'duration': 5.383}, {'end': 3425.119, 'text': "And we'll talk about all of these in other courses here at Datalab.", 'start': 3421.316, 'duration': 3.803}], 'summary': 'R and python are common for data, sql for databases, c/c++/java for backend, bash and regex also used.', 'duration': 27.118, 'max_score': 3398.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj303398001.jpg'}, {'end': 3699.693, 'src': 'embed', 'start': 3673.124, 'weight': 8, 'content': [{'end': 3678.145, 'text': "And then Bayes theorem, which is a way of getting what's called a posterior probability,", 'start': 3673.124, 'duration': 5.021}, {'end': 3682.986, 'text': 'can also be a really helpful tool for answering some fundamental questions in data science.', 'start': 3678.145, 'duration': 4.841}, {'end': 3692.092, 'text': 'So in sum, A little bit of math can help you make informed choices when planning your analyses.', 'start': 3684.326, 'duration': 7.766}, {'end': 3699.693, 'text': "Very significantly, it can help you find the problems and fix them when things aren't going right.", 'start': 3693.371, 'duration': 6.322}], 'summary': 'Bayes theorem helps calculate posterior probability, aiding data analysis and problem-solving.', 'duration': 26.569, 'max_score': 3673.124, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj303673124.jpg'}, {'end': 3794.671, 'src': 'embed', 'start': 3765.406, 'weight': 9, 'content': [{'end': 3769.069, 'text': 'You can have exploratory statistics, a numerical exploration of the data.', 'start': 3765.406, 'duration': 3.663}, {'end': 3771.032, 'text': 'And you can have descriptive statistics,', 'start': 3769.49, 'duration': 1.542}, {'end': 3776.117, 'text': 'which are the things that most people would have talked about when they took a statistics class in college if they did that.', 'start': 3771.032, 'duration': 5.085}, {'end': 3785.904, 'text': "Next, there's inference, I've got smoke here, because you can infer things about the wind and the air movement by looking at patterns and smoke.", 'start': 3777.938, 'duration': 7.966}, {'end': 3794.671, 'text': "The idea here is that you're trying to take information from samples and infer something about a population.", 'start': 3786.745, 'duration': 7.926}], 'summary': 'Exploratory and descriptive statistics aid in inferring about a population.', 'duration': 29.265, 'max_score': 3765.406, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj303765406.jpg'}, {'end': 4154.323, 'src': 'embed', 'start': 4127.682, 'weight': 11, 'content': [{'end': 4132.526, 'text': "In the last several videos, I've talked about the role in data science of technical things.", 'start': 4127.682, 'duration': 4.844}, {'end': 4137.29, 'text': 'On the other hand, communicating is also central to the practice.', 'start': 4133.227, 'duration': 4.063}, {'end': 4141.133, 'text': 'And the first thing I want to talk about there is interpretability.', 'start': 4137.79, 'duration': 3.343}, {'end': 4151.161, 'text': 'The idea here is that you want to be able to lead people through a path on your data, you want to tell a data driven story.', 'start': 4141.934, 'duration': 9.227}, {'end': 4154.323, 'text': "And that's the entire goal of what we're doing with data science.", 'start': 4151.581, 'duration': 2.742}], 'summary': 'Interpretability and communication are central to data science, aiming to tell a data-driven story and lead people through a path on the data.', 'duration': 26.641, 'max_score': 4127.682, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj304127682.jpg'}], 'start': 107.262, 'title': 'Data science fundamentals', 'summary': 'Introduces the field of data science, highlighting its demand and career prospects, with a projected need for 140-190,000 data scientists and 1.5 million data-savvy managers. it explains the data science pathway, diverse roles, contrasts with big data and statistics, and provides an overview of data science methods, programming languages, mathematics, statistics, and machine learning.', 'chapters': [{'end': 962.417, 'start': 107.262, 'title': 'Introduction to data science', 'summary': 'Introduces the field of data science, emphasizing its demand, significance, and career prospects, with quantifiable data showing a projected need for 140-190,000 data scientists and 1.5 million data-savvy managers in the next few years, and data scientists ranking as one of the highest-paying professions.', 'duration': 855.155, 'highlights': ['Data science is projected to have a high demand, with a need for 140-190,000 practicing data scientists and 1.5 million data-savvy managers in the next few years.', 'Data scientists rank as one of the highest-paying professions, with an average total compensation of about $144,000 a year, positioning them among the top 10 highest paying professions.', 'Data science skills, particularly statistical analysis and data mining, are among the top job skills that got people hired in various countries, including Australia, Brazil, Canada, France, India, the Netherlands, South Africa, the United Arab Emirates, and the United Kingdom.']}, {'end': 1586.597, 'start': 964.602, 'title': 'Data science pathway & roles', 'summary': 'Explains the general steps of the data science pathway, emphasizing the importance of data prep, statistical modeling, and follow-up, while also highlighting the diverse roles involved in data science and the significance of team collaboration.', 'duration': 621.995, 'highlights': ['Data prep, statistical modeling, and follow-up are the key steps of the data science pathway, emphasizing the importance of cleaning, exploring, refining data, creating and validating models, and presenting and deploying the insights.', "The diverse roles in data science include engineers, big data specialists, researchers, analysts, business people, entrepreneurs, and the elusive 'full stack unicorn,' each contributing unique skills and experiences to the field.", "Team collaboration is essential in data science, as individuals with varying expertise in coding, statistics, design, and business need to work together to compensate for each other's strengths and weaknesses and ensure project competence."]}, {'end': 2002.362, 'start': 1586.998, 'title': 'Data science and big data contrasts', 'summary': 'Discusses the relationship between data science and big data, highlighting their differences and how they can be combined, emphasizing the need for collective skills and expertise in order to achieve insights and project goals.', 'duration': 415.364, 'highlights': ['Collective skills and expertise are essential for data science and big data, emphasizing the need for a combination of coding, math, and domain expertise to achieve insights and project goals.', "Comparison between data science and big data, highlighting the differences and similarities, and the ability to combine the two to form 'Big Data Science'.", 'Contrast between data science and coding, emphasizing the differences in their approach and the complexity involved in data science tasks.']}, {'end': 2719.018, 'start': 2002.803, 'title': 'Comparison of data science and statistics', 'summary': 'Compares data science with programming and statistics, highlighting the differences in tools, statistical ability, training backgrounds of professionals, ethical issues, and the relationship with business intelligence.', 'duration': 716.215, 'highlights': ['Data science tools compared to programming tools', 'Statistical ability as a major separator', 'Divergence between statistics and data science', 'Comparison between data science and business intelligence', 'Ethical issues in data science']}, {'end': 3377.473, 'start': 2720.928, 'title': 'Data science methods overview', 'summary': 'Discusses the importance of data security in data science, addressing potential biases and overconfidence, and provides an overview of data sourcing and coding methods, emphasizing the need for quality data and the use of various technologies to manipulate data.', 'duration': 656.545, 'highlights': ['Data security is crucial in data science to protect valuable data from hackers, ensuring it is safe and cannot be broken into and stolen, with additional concerns about potential biases and overconfidence in analyses.', 'Data sourcing methods include using existing data, data APIs, web data scraping, and making data, with the importance of checking the quality and meaning of the data to gain valuable insights.', 'Coding in data science involves using various technologies such as specialized applications, data formats, and programming languages to manipulate data and perform procedures to gain insights.']}, {'end': 4438.229, 'start': 3377.473, 'title': 'Data science methods overview', 'summary': 'Discusses the importance of programming languages and tools like r, python, sql, bash, and regular expressions, the role of mathematics in data science, including algebra, calculus, big o, probability theory, and bayes theorem, the significance of statistics in exploring and inferring data, and the overview of machine learning for categorization and prediction. additionally, it emphasizes the necessity of interpretability and communication in presenting data-driven stories and analyses.', 'duration': 1060.756, 'highlights': ['The importance of programming languages and tools like R, Python, SQL, Bash, and regular expressions', 'The role of mathematics in data science, including algebra, calculus, big O, probability theory, and Bayes theorem', 'The significance of statistics in exploring and inferring data', 'The overview of machine learning for categorization and prediction', 'The necessity of interpretability and communication in presenting data-driven stories and analyses']}], 'duration': 4330.967, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj30107262.jpg', 'highlights': ['Data science is projected to have a high demand, with a need for 140-190,000 practicing data scientists and 1.5 million data-savvy managers in the next few years.', 'Data scientists rank as one of the highest-paying professions, with an average total compensation of about $144,000 a year, positioning them among the top 10 highest paying professions.', 'Data science skills, particularly statistical analysis and data mining, are among the top job skills that got people hired in various countries, including Australia, Brazil, Canada, France, India, the Netherlands, South Africa, the United Arab Emirates, and the United Kingdom.', 'Data prep, statistical modeling, and follow-up are the key steps of the data science pathway, emphasizing the importance of cleaning, exploring, refining data, creating and validating models, and presenting and deploying the insights.', "The diverse roles in data science include engineers, big data specialists, researchers, analysts, business people, entrepreneurs, and the elusive 'full stack unicorn,' each contributing unique skills and experiences to the field.", 'Collective skills and expertise are essential for data science and big data, emphasizing the need for a combination of coding, math, and domain expertise to achieve insights and project goals.', 'Data security is crucial in data science to protect valuable data from hackers, ensuring it is safe and cannot be broken into and stolen, with additional concerns about potential biases and overconfidence in analyses.', 'The importance of programming languages and tools like R, Python, SQL, Bash, and regular expressions', 'The role of mathematics in data science, including algebra, calculus, big O, probability theory, and Bayes theorem', 'The significance of statistics in exploring and inferring data', 'The overview of machine learning for categorization and prediction', 'The necessity of interpretability and communication in presenting data-driven stories and analyses']}, {'end': 5361.551, 'segs': [{'end': 4483.907, 'src': 'embed', 'start': 4438.929, 'weight': 0, 'content': [{'end': 4445.932, 'text': 'They talk about being minimally sufficient, just enough to adequately answer the question.', 'start': 4438.929, 'duration': 7.003}, {'end': 4453.575, 'text': "If you're in commerce, you know about a minimal viable product is sort of the same idea with an analysis here, the minimum viable analysis.", 'start': 4445.952, 'duration': 7.623}, {'end': 4456.806, 'text': "So, here's a few tips.", 'start': 4455.385, 'duration': 1.421}, {'end': 4460.149, 'text': "When you're giving a presentation, more charts, less text.", 'start': 4457.487, 'duration': 2.662}, {'end': 4463.092, 'text': 'Great And then, simplify the charts.', 'start': 4460.59, 'duration': 2.502}, {'end': 4464.873, 'text': "Remove everything that doesn't need to be in there.", 'start': 4463.192, 'duration': 1.681}, {'end': 4469.578, 'text': 'Generally, you want to avoid tables of data, because those are hard to read.', 'start': 4465.554, 'duration': 4.024}, {'end': 4473.841, 'text': 'And then, one more time, because I want to emphasize it, less text.', 'start': 4470.298, 'duration': 3.543}, {'end': 4477.865, 'text': 'Again, charts, tables can usually carry the message.', 'start': 4474.122, 'duration': 3.743}, {'end': 4480.147, 'text': 'And so, let me give you an example here.', 'start': 4478.686, 'duration': 1.461}, {'end': 4483.907, 'text': "I'm going to give a very famous dataset, Berkeley Admissions.", 'start': 4481.343, 'duration': 2.564}], 'summary': 'Emphasize minimal, visual data in presentations to effectively convey information.', 'duration': 44.978, 'max_score': 4438.929, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj304438929.jpg'}, {'end': 4672.727, 'src': 'embed', 'start': 4645.53, 'weight': 2, 'content': [{'end': 4650.872, 'text': "In sum, let's say this, stories give value to data analyses.", 'start': 4645.53, 'duration': 5.342}, {'end': 4660.273, 'text': "And when you tell the story, you need to make sure that you are addressing your client's goals in a clear, unambiguous way.", 'start': 4652.685, 'duration': 7.588}, {'end': 4664.758, 'text': 'And the overall principle here is be minimally sufficient.', 'start': 4661.094, 'duration': 3.664}, {'end': 4672.727, 'text': 'Get to the point, make it clear, say what you need to, but otherwise be concise and make your message clear.', 'start': 4665.459, 'duration': 7.268}], 'summary': "Stories add value to data analyses by addressing client's goals in a minimally sufficient, clear, and concise manner.", 'duration': 27.197, 'max_score': 4645.53, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj304645530.jpg'}, {'end': 4954.714, 'src': 'embed', 'start': 4900.701, 'weight': 1, 'content': [{'end': 4911.527, 'text': "Some people have proposed adding a fourth circle to this Venn diagram and we'll kind of put that in there and say that social understanding is also important,", 'start': 4900.701, 'duration': 10.826}, {'end': 4914.009, 'text': 'critical really, to valid data science.', 'start': 4911.527, 'duration': 2.482}, {'end': 4916.23, 'text': 'Now, I love that idea.', 'start': 4914.789, 'duration': 1.441}, {'end': 4920.513, 'text': "And I do think that it's important to understand how things are going to play out.", 'start': 4917.03, 'duration': 3.483}, {'end': 4922.614, 'text': "There's a few kinds of social understanding.", 'start': 4920.973, 'duration': 1.641}, {'end': 4924.895, 'text': "You want to be aware of your client's mission.", 'start': 4922.774, 'duration': 2.121}, {'end': 4929.458, 'text': "You want to make sure that your recommendations are consistent with your client's mission.", 'start': 4925.336, 'duration': 4.122}, {'end': 4933.421, 'text': "Also, that your recommendations are consistent with your client's identity.", 'start': 4930.119, 'duration': 3.302}, {'end': 4936.163, 'text': 'Not just, this is what we do, but this is really who we are.', 'start': 4933.521, 'duration': 2.642}, {'end': 4943.968, 'text': "You need to be aware of the business context, sort of the competitive environment and the regulatory environment that they're working in,", 'start': 4937.165, 'duration': 6.803}, {'end': 4945.389, 'text': 'as well as the social context.', 'start': 4943.968, 'duration': 1.421}, {'end': 4949.991, 'text': 'And that can be outside of the organization, but even more often within the organization.', 'start': 4945.409, 'duration': 4.582}, {'end': 4954.714, 'text': "Your recommendations will affect relationships within the client's organization.", 'start': 4950.452, 'duration': 4.262}], 'summary': "Social understanding is critical to valid data science, including awareness of client's mission, identity, and business context.", 'duration': 54.013, 'max_score': 4900.701, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj304900701.jpg'}], 'start': 4438.929, 'title': 'Effective data presentation', 'summary': "Discusses effective presentation tips including minimal analysis, more charts, and simplifying data. it also covers graduate school admissions bias at uc berkeley in 1973, emphasizing the need for storytelling in data analysis. additionally, it highlights aligning data science recommendations with the client's mission and creating impactful presentation graphics.", 'chapters': [{'end': 4483.907, 'start': 4438.929, 'title': 'Effective presentation tips', 'summary': 'Discusses the concept of being minimally sufficient in analysis, providing tips for effective presentations including using more charts, less text, simplifying charts, avoiding tables of data and using examples such as the berkeley admissions dataset.', 'duration': 44.978, 'highlights': ['The concept of being minimally sufficient in analysis is discussed, drawing parallels with the minimal viable product in commerce.', 'Tips for effective presentations are provided, emphasizing the use of more charts and less text, simplifying charts, and avoiding tables of data.', 'An example using the famous Berkeley Admissions dataset is given to illustrate the points about effective presentation techniques.']}, {'end': 4922.614, 'start': 4483.987, 'title': 'Graduate school admissions bias', 'summary': 'Illustrates a case of gender bias in graduate school admissions at the university of california at berkeley in 1973, where men were admitted at a higher rate than women, leading to a lawsuit. however, a deeper analysis revealed a paradox as certain programs showed higher acceptance rates for women, ultimately emphasizing the importance of storytelling in data analysis and the need to understand correlation versus causation.', 'duration': 438.627, 'highlights': ["The university's graduate school admissions in 1973 showed a bias in favor of men, with a 44% admission rate for male applicants compared to 35% for female applicants, leading to a significant issue that resulted in a lawsuit.", 'Despite the initial bias, a further breakdown by program revealed a paradox where certain programs showed higher acceptance rates for female applicants, highlighting the importance of a comprehensive analysis before drawing conclusions.', "The chapter emphasizes the value of storytelling in data analysis, highlighting the need to address the client's goals clearly and concisely while providing actionable insights based on the data.", 'It discusses the fundamental philosophical problem of distinguishing between correlation and causation, and presents various methods such as experimental studies, quasi-experiments, theory and experience, as well as the consideration of social factors in data analysis.', 'The importance of social understanding in valid data science is emphasized, indicating the need to comprehend how social factors impact the outcomes of data analysis.']}, {'end': 5361.551, 'start': 4922.774, 'title': 'Data science and presentation graphics', 'summary': "Emphasizes the importance of aligning data science recommendations with the client's mission and identity, understanding the business and social context, and creating clear and impactful presentation graphics to effectively communicate insights, with a focus on achieving the client's goals.", 'duration': 438.777, 'highlights': ["The importance of aligning recommendations with the client's mission and identity", 'Understanding the business and social context for effective recommendations', 'Focus on creating clear and impactful presentation graphics']}], 'duration': 922.622, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj304438929.jpg', 'highlights': ['Tips for effective presentations: more charts, less text, and simplified charts', "Importance of aligning data science recommendations with the client's mission", "Value of storytelling in data analysis and addressing client's goals clearly", 'Example using the famous Berkeley Admissions dataset to illustrate presentation techniques', 'Understanding the business and social context for effective recommendations', 'Importance of social understanding in valid data science']}, {'end': 6591.406, 'segs': [{'end': 5445.212, 'src': 'embed', 'start': 5398.746, 'weight': 0, 'content': [{'end': 5404.553, 'text': "But no matter what you're doing, be clear in your graphics and be focused in what you're trying to tell.", 'start': 5398.746, 'duration': 5.807}, {'end': 5405.855, 'text': 'And, above all,', 'start': 5405.114, 'duration': 0.741}, {'end': 5416.932, 'text': "create a strong narrative that gives a different level of perspective and answers questions as you go to anticipate a client's question and to give them the most reliable,", 'start': 5405.855, 'duration': 11.077}, {'end': 5419.777, 'text': 'solid information and the greatest confidence in your analysis.', 'start': 5416.932, 'duration': 2.845}, {'end': 5427.758, 'text': 'The final element of data science and communicating that I wanted to talk about is reproducible research.', 'start': 5421.914, 'duration': 5.844}, {'end': 5429.979, 'text': 'And you can think of it as this idea.', 'start': 5428.638, 'duration': 1.341}, {'end': 5432.6, 'text': 'You want to be able to play that song again.', 'start': 5430.419, 'duration': 2.181}, {'end': 5436.423, 'text': 'And the reason for that is data science projects are rarely one and done.', 'start': 5432.62, 'duration': 3.803}, {'end': 5445.212, 'text': "Rather, they tend to be incremental, they tend to be cumulative, and they tend to adapt to the circumstances that they're working in.", 'start': 5437.063, 'duration': 8.149}], 'summary': 'Create clear and focused graphics, strong narrative, and prioritize reproducible research for reliable and adaptable data science projects.', 'duration': 46.466, 'max_score': 5398.746, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj305398746.jpg'}, {'end': 5552.245, 'src': 'embed', 'start': 5523.236, 'weight': 3, 'content': [{'end': 5529.06, 'text': "It's a way of sharing your data and your research with an annotation of how you got through the whole thing with other people.", 'start': 5523.236, 'duration': 5.824}, {'end': 5532.542, 'text': 'It makes the research transparent, which is what we need.', 'start': 5529.46, 'duration': 3.082}, {'end': 5541.9, 'text': 'One of my professional organizations, the Association for Psychological Science, has a major initiative on this called Open Practices,', 'start': 5534.477, 'duration': 7.423}, {'end': 5552.245, 'text': 'where they are strongly encouraging people to share their data as much as is ethically permissible and to absolutely share their methods before they even conduct the study,', 'start': 5541.9, 'duration': 10.345}], 'summary': 'Sharing data and research with annotations promotes transparency and is encouraged by association for psychological science.', 'duration': 29.009, 'max_score': 5523.236, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj305523236.jpg'}, {'end': 5709.476, 'src': 'embed', 'start': 5683.799, 'weight': 4, 'content': [{'end': 5689.022, 'text': "A really common one, especially if you're using Python, is Jupiter, with a Y there in the middle.", 'start': 5683.799, 'duration': 5.223}, {'end': 5692.204, 'text': 'The Jupyter notebooks are interactive notebooks.', 'start': 5689.842, 'duration': 2.362}, {'end': 5696.347, 'text': "So here's a screenshot of a very simple one I made in Python.", 'start': 5692.244, 'duration': 4.103}, {'end': 5700.11, 'text': 'And you have titles, you have text, you have the graphics.', 'start': 5697.127, 'duration': 2.983}, {'end': 5706.314, 'text': "If you're working in R, you can do this through something called R Markdown, which works in the same way.", 'start': 5700.79, 'duration': 5.524}, {'end': 5709.476, 'text': 'you do it in RStudio use Markdown and you can annotate the whole thing.', 'start': 5706.314, 'duration': 3.162}], 'summary': 'Jupyter notebooks popular for python, r uses r markdown for similar functionality.', 'duration': 25.677, 'max_score': 5683.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj305683799.jpg'}, {'end': 6169.966, 'src': 'embed', 'start': 6141.218, 'weight': 6, 'content': [{'end': 6144.8, 'text': 'Next are specific metrics or ways of measuring.', 'start': 6141.218, 'duration': 3.582}, {'end': 6147.822, 'text': 'Now again, there are a few different categories here.', 'start': 6145.581, 'duration': 2.241}, {'end': 6152.565, 'text': 'There are business metrics, there are key performance indicators or KPIs,', 'start': 6148.142, 'duration': 4.423}, {'end': 6157.247, 'text': "there are SMART goals that's an acronym and there's also the issue of having multiple goals.", 'start': 6152.565, 'duration': 4.682}, {'end': 6159.929, 'text': "I'll talk about each of those for just a second now.", 'start': 6157.468, 'duration': 2.461}, {'end': 6163.944, 'text': "First off, let's talk about business metrics.", 'start': 6161.724, 'duration': 2.22}, {'end': 6169.966, 'text': "If you're in the commercial world, there are some common ways of measuring success.", 'start': 6164.545, 'duration': 5.421}], 'summary': 'Discussing different categories of metrics including business metrics and smart goals in the commercial world.', 'duration': 28.748, 'max_score': 6141.218, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306141218.jpg'}, {'end': 6345.109, 'src': 'embed', 'start': 6320.91, 'weight': 5, 'content': [{'end': 6327.374, 'text': "And that's a way of saying that this is a good goal to be used as a metric for the success of our organization.", 'start': 6320.91, 'duration': 6.464}, {'end': 6334.843, 'text': 'Now the trick, however, is when you have multiple goals, multiple possible endpoints.', 'start': 6328.84, 'duration': 6.003}, {'end': 6337.865, 'text': "And the reason that's difficult is because, well,", 'start': 6335.484, 'duration': 2.381}, {'end': 6345.109, 'text': "it's easy to focus on one goal if you're just trying to maximize revenue or if you're just trying to maximize graduation rate.", 'start': 6337.865, 'duration': 7.244}], 'summary': 'Setting clear goals is crucial for organizational success, but managing multiple goals can be challenging.', 'duration': 24.199, 'max_score': 6320.91, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306320910.jpg'}, {'end': 6445.304, 'src': 'embed', 'start': 6417.385, 'weight': 7, 'content': [{'end': 6420.11, 'text': "The idea here is that you don't want to have to throw away all your ideas.", 'start': 6417.385, 'duration': 2.725}, {'end': 6421.493, 'text': "You don't want to waste effort.", 'start': 6420.15, 'duration': 1.343}, {'end': 6429.294, 'text': 'One way of doing this in a very quantitative fashion is to make a classification table.', 'start': 6422.809, 'duration': 6.485}, {'end': 6432.015, 'text': 'So what that looks like is this.', 'start': 6430.234, 'duration': 1.781}, {'end': 6436.679, 'text': 'You talk about, for instance, positive results, negative results.', 'start': 6432.516, 'duration': 4.163}, {'end': 6438.84, 'text': "And in fact, let's start by looking at the top here.", 'start': 6436.739, 'duration': 2.101}, {'end': 6445.304, 'text': 'The middle two columns here talk about whether an event is present, whether your house is on fire, whether a sale occurs,', 'start': 6439.04, 'duration': 6.264}], 'summary': 'Avoid discarding ideas by using a quantitative classification table for positive and negative results.', 'duration': 27.919, 'max_score': 6417.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306417385.jpg'}], 'start': 5362.131, 'title': 'Data science best practices', 'summary': 'Emphasizes effective data presentation, reproducible research, open data science best practices, data sourcing goals and metrics, and measuring success in organizations. it focuses on clear presentation graphics, reproducibility, open data sharing, goal-setting, kpis, and accuracy measurements.', 'chapters': [{'end': 5481.66, 'start': 5362.131, 'title': 'Effective data presentation & reproducible research', 'summary': 'Emphasizes the importance of clear and focused presentation graphics, creating a strong narrative for data, and the necessity of reproducible research in data science projects for accountability and adaptability.', 'duration': 119.529, 'highlights': ["Creating a strong narrative for data presentation is crucial for giving a different level of perspective and providing reliable, solid information to anticipate client's questions and build confidence in analysis.", 'Reproducible research is essential in data science projects due to their incremental and adaptive nature, as it allows for revising, borrowing from previous studies, and ensuring accountability in both scientific and economic research.', 'Clear and focused presentation graphics are emphasized as being different from graphics used for exploration, with the major goal being to tell a narrative about the data and answer questions, providing the most reliable information to clients.', "The importance of being able to show one's work in data science projects is highlighted, as it allows for potential revisions, sharing with others, and ensuring accountability in research.", 'Data science projects are described as rarely being one-time endeavors, but rather incremental and cumulative, emphasizing the need for reproducibility and adaptability in research.']}, {'end': 5746.698, 'start': 5482.08, 'title': 'Open data science best practices', 'summary': 'Emphasizes the importance of open data science, including sharing data and methods, archiving data and code, using non-proprietary formats, and utilizing tools like github and jupyter notebooks to ensure transparency and reproducibility.', 'duration': 264.618, 'highlights': ['The Open Data Science Conference at ODSC.com meets three times a year and is devoted to open data science.', 'The open science framework (osf.io) allows sharing data and research with annotations, promoting transparency and accountability.', 'The Association for Psychological Science has an initiative called Open Practices, strongly encouraging people to share their data and methods.', 'Archiving data sets and code, using non-proprietary formats, and storing files in accessible locations like GitHub are essential for future-proofing research.', 'Utilizing tools like Jupyter notebooks and R Markdown for creating interactive and reproducible research documents.']}, {'end': 6208.308, 'start': 5747.839, 'title': 'Data sourcing: goals and metrics', 'summary': 'Discusses the importance of goal-setting and metrics in data science, emphasizing the need for explicit goals and specific metrics to guide efforts and measure success, with examples from different domains and categories of metrics.', 'duration': 460.469, 'highlights': ['The chapter emphasizes the importance of explicit goals and specific metrics to guide efforts and measure success.', 'The discussion provides examples of success standards in different domains, such as commerce, education, government, and research.', 'Different categories of metrics are explained, including business metrics, key performance indicators (KPIs), SMART goals, and the issue of having multiple goals.']}, {'end': 6591.406, 'start': 6209.108, 'title': 'Measuring success in organizations', 'summary': 'Discusses key performance indicators (kpis) and smart goals for measuring success, the challenges of balancing multiple goals, the importance of accuracy in measurements, and provides a detailed breakdown of four different ways of quantifying accuracy using different standards.', 'duration': 382.298, 'highlights': ['The chapter discusses the key performance indicators (KPIs) and SMART goals for measuring success in organizations, emphasizing the importance of simplicity, team-based approach, significant impact, and limited dark side in KPIs, as well as the specific, measurable, assignable, realistic, and time-bound criteria for SMART goals.', 'It addresses the challenge of balancing efforts to reach multiple goals simultaneously and the need for optimization, especially when conflicting goals may impair each other, suggesting mathematical optimization as a method to find the ideal balance of efforts.', 'The importance of accuracy in measurements is emphasized, and a quantitative method involving a classification table is provided to quantify accuracy using different standards such as sensitivity, specificity, positive predictive value, and negative predictive value.']}], 'duration': 1229.275, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj305362131.jpg', 'highlights': ['Creating a strong narrative for data presentation is crucial for providing reliable information (Paragraph 1)', 'Reproducible research is essential in data science projects for accountability and adaptability (Paragraph 1)', 'Clear and focused presentation graphics are emphasized for providing reliable information to clients (Paragraph 1)', 'Open data science best practices include sharing data and research with annotations for transparency (Paragraph 2)', 'Utilizing tools like Jupyter notebooks and R Markdown for creating reproducible research documents (Paragraph 2)', 'The chapter emphasizes the importance of explicit goals and specific metrics to measure success (Paragraph 3)', 'The chapter discusses the key performance indicators (KPIs) and SMART goals for measuring success in organizations (Paragraph 4)', 'The importance of accuracy in measurements is emphasized, and a quantitative method involving a classification table is provided (Paragraph 4)']}, {'end': 7643.822, 'segs': [{'end': 6622.771, 'src': 'embed', 'start': 6592.286, 'weight': 0, 'content': [{'end': 6596.848, 'text': "Well, here you're looking at true negatives and dividing it by total negatives, the time that it doesn't ring.", 'start': 6592.286, 'duration': 4.562}, {'end': 6599.649, 'text': 'And again you want to maximize that.', 'start': 6597.328, 'duration': 2.321}, {'end': 6605.952, 'text': 'so the true negatives account for all of the negatives, the same way you want the true positives to account for all of the positives, and so on.', 'start': 6599.649, 'duration': 6.303}, {'end': 6609.876, 'text': 'Now you can put numbers on all of these going from 0% to 100%.', 'start': 6607.012, 'duration': 2.864}, {'end': 6614.081, 'text': 'And the idea is to maximize each one as much as you can.', 'start': 6609.876, 'duration': 4.205}, {'end': 6622.771, 'text': "So in sum, from these tables, we get four kinds of accuracy, and there's a different focus for each one.", 'start': 6615.302, 'duration': 7.469}], 'summary': 'Maximize true negatives and true positives for each 0-100% accuracy range.', 'duration': 30.485, 'max_score': 6592.286, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306592286.jpg'}, {'end': 6671.178, 'src': 'embed', 'start': 6641.085, 'weight': 1, 'content': [{'end': 6645.948, 'text': "Now data sourcing may seem like a very quantitative topic, especially when we're talking about measurement.", 'start': 6641.085, 'duration': 4.863}, {'end': 6648.27, 'text': 'But I want to measure one important thing here.', 'start': 6646.489, 'duration': 1.781}, {'end': 6651.032, 'text': 'And that is the social context of measurement.', 'start': 6648.29, 'duration': 2.742}, {'end': 6658.495, 'text': "The idea here really is that people are people, and they all have their own goals, and they're going their own ways.", 'start': 6652.253, 'duration': 6.242}, {'end': 6662.455, 'text': "And we all have our own thoughts and feelings that don't always coincide with each other.", 'start': 6658.595, 'duration': 3.86}, {'end': 6664.596, 'text': 'And this can affect measurement.', 'start': 6663.016, 'duration': 1.58}, {'end': 6671.178, 'text': "And so, for instance, when you're trying to define your goals and you're trying to maximize them, you want to look at things like, for instance,", 'start': 6665.076, 'duration': 6.102}], 'summary': 'Data sourcing includes social context affecting measurement.', 'duration': 30.093, 'max_score': 6641.085, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306641085.jpg'}, {'end': 6729.578, 'src': 'embed', 'start': 6701.035, 'weight': 2, 'content': [{'end': 6703.396, 'text': 'that may limit the ways that goals can be met.', 'start': 6701.035, 'duration': 2.361}, {'end': 6706.558, 'text': 'Now, most of these make a lot of sense.', 'start': 6703.977, 'duration': 2.581}, {'end': 6711.601, 'text': "So the idea is you can't just do anything you want, you need to have these constraints.", 'start': 6707.139, 'duration': 4.462}, {'end': 6718.385, 'text': "And when you make your recommendations, maybe you'll work creatively in them as long as you're still behaving legally and ethically.", 'start': 6712.182, 'duration': 6.203}, {'end': 6721.507, 'text': 'But you do need to be aware of these constraints.', 'start': 6719.266, 'duration': 2.241}, {'end': 6724.734, 'text': 'Next is the environment.', 'start': 6723.281, 'duration': 1.453}, {'end': 6729.578, 'text': 'And the idea here is that competition occurs both between organizations.', 'start': 6725.495, 'duration': 4.083}], 'summary': 'Constraints guide creative work within legal and ethical boundaries.', 'duration': 28.543, 'max_score': 6701.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306701035.jpg'}, {'end': 6838.692, 'src': 'embed', 'start': 6813.809, 'weight': 6, 'content': [{'end': 6821.516, 'text': 'just be aware of these particular issues and be sensitive to them as you both conduct your research and as you make your recommendations.', 'start': 6813.809, 'duration': 7.707}, {'end': 6827.782, 'text': 'So in some social factors affect goals, and they affect the way that you meet those goals.', 'start': 6822.517, 'duration': 5.265}, {'end': 6834.569, 'text': 'There are limits and consequences both on how you reach the goals and how really what the goal should be.', 'start': 6828.703, 'duration': 5.866}, {'end': 6838.692, 'text': "And that when you're making advice on how to reach those goals,", 'start': 6835.309, 'duration': 3.383}], 'summary': 'Social factors impact goal achievement and recommendations, be sensitive and aware.', 'duration': 24.883, 'max_score': 6813.809, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306813809.jpg'}, {'end': 6967.06, 'src': 'embed', 'start': 6938.737, 'weight': 3, 'content': [{'end': 6944.099, 'text': "So these are things that you need to think about when you're going to use in-house data,", 'start': 6938.737, 'duration': 5.362}, {'end': 6948.521, 'text': 'in terms of how can you use it to facilitate your data science projects.', 'start': 6944.099, 'duration': 4.422}, {'end': 6951.503, 'text': 'Specifically, there are a few pros and cons.', 'start': 6949.862, 'duration': 1.641}, {'end': 6960.733, 'text': 'in house data, potentially quick, easy, free, hopefully is standardized, maybe even the original team that conducted this study is still there.', 'start': 6952.523, 'duration': 8.21}, {'end': 6967.06, 'text': 'And you might have identifiers in the data, which make it easier for you to do an individual level analysis.', 'start': 6961.313, 'duration': 5.747}], 'summary': 'In-house data for data science: pros and cons, potential quick, easy, free, standardized, and with identifiers for individual level analysis.', 'duration': 28.323, 'max_score': 6938.737, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306938737.jpg'}, {'end': 7194.824, 'src': 'embed', 'start': 7165.892, 'weight': 4, 'content': [{'end': 7170.694, 'text': 'Plus, they can save you some time and effort by actually doing some of the processing for you.', 'start': 7165.892, 'duration': 4.802}, {'end': 7175.115, 'text': 'And that can include things like consumer behaviors and preferences.', 'start': 7171.474, 'duration': 3.641}, {'end': 7179.336, 'text': 'they can get contact information, they can do marketing, identity and finances.', 'start': 7175.115, 'duration': 4.221}, {'end': 7180.116, 'text': "there's a lot of things.", 'start': 7179.336, 'duration': 0.78}, {'end': 7183.978, 'text': "There's a number of data brokers around.", 'start': 7181.136, 'duration': 2.842}, {'end': 7185.179, 'text': "Here's a few of them.", 'start': 7184.438, 'duration': 0.741}, {'end': 7189.481, 'text': 'Axiom is probably the biggest one in terms of marketing data.', 'start': 7185.559, 'duration': 3.922}, {'end': 7194.824, 'text': "There's also Nielsen, which provides data primarily for media consumption.", 'start': 7190.181, 'duration': 4.643}], 'summary': 'Data brokers can process consumer behavior and preferences, contact info, marketing, identity, and finances, with axiom and nielsen as key players.', 'duration': 28.932, 'max_score': 7165.892, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj307165892.jpg'}, {'end': 7398.136, 'src': 'embed', 'start': 7371.779, 'weight': 5, 'content': [{'end': 7375.462, 'text': "The nice thing about that is that's human readable, but it's even better for machines.", 'start': 7371.779, 'duration': 3.683}, {'end': 7381.086, 'text': 'Then you can take that information and you can send it directly to other programs.', 'start': 7376.783, 'duration': 4.303}, {'end': 7390.232, 'text': "And the nice thing about REST APIs is that they're what's called language agnostic, meaning any programming language can call a REST API,", 'start': 7381.566, 'duration': 8.666}, {'end': 7393.774, 'text': 'can get data from the web and can do whatever it needs to with it.', 'start': 7390.232, 'duration': 3.542}, {'end': 7398.136, 'text': 'Now, there are a few kinds of APIs that are really common.', 'start': 7395.295, 'duration': 2.841}], 'summary': 'Rest apis allow any programming language to call them, making them language agnostic and enabling data transfer to other programs.', 'duration': 26.357, 'max_score': 7371.779, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj307371779.jpg'}], 'start': 6592.286, 'title': 'Data accuracy and sourcing', 'summary': 'Discusses maximizing true positives and true negatives, understanding social context, constraints, competition, and manipulation in recommendations, and compares in-house data versus open data sourcing. it also covers the use of data brokers, advantages of using apis, and programming languages for api integration.', 'chapters': [{'end': 6813.809, 'start': 6592.286, 'title': 'Measuring accuracy and social context', 'summary': 'Discusses the importance of maximizing true positives and true negatives, understanding the social context of measurement, and being aware of constraints, competition, and manipulation in making recommendations.', 'duration': 221.523, 'highlights': ['Understanding the social context of measurement', 'Maximizing true positives and true negatives', 'Awareness of constraints, competition, and manipulation']}, {'end': 7143.863, 'start': 6813.809, 'title': 'Data sourcing: in-house vs open data', 'summary': 'Discusses the importance of being sensitive to social factors when setting and reaching goals, and compares the pros and cons of using in-house data versus open data for data sourcing.', 'duration': 330.054, 'highlights': ["Social factors affect goal-setting and reaching, influencing behavior and metrics, and it's crucial to be sensitive to these factors for accurate goal prediction and implementation.", 'Comparing in-house data and open data, the chapter details the pros and cons of each, including speed, ease, and quality control for in-house data, and the wide range of topics and well-documented format for open data.']}, {'end': 7643.822, 'start': 7144.383, 'title': 'Data sourcing and apis in data science', 'summary': 'Discusses the use of data brokers as a source of large amounts of data on various topics, the pros and cons of using data brokers, and the advantages of using apis, particularly rest apis, to directly access web data in data science, with examples of common social and visual apis and programming languages used for api integration.', 'duration': 499.439, 'highlights': ['Data brokers provide an enormous amount of data on various topics and can save time and effort by processing consumer behaviors, preferences, contact information, marketing, identity, and finances.', 'Using data brokers can save time and effort and provide individual-level data, but it can be expensive and may be seen as distasteful by some.', 'REST APIs allow programs to directly access web data, retrieve it in JSON format, and are language-agnostic, enabling integration with various programming languages.']}], 'duration': 1051.536, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj306592286.jpg', 'highlights': ['Maximizing true positives and true negatives', 'Understanding the social context of measurement', 'Awareness of constraints, competition, and manipulation', 'Comparing in-house data and open data pros and cons', 'Data brokers provide a vast amount of consumer data', 'REST APIs enable integration with various programming languages', 'Social factors influence behavior and goal prediction accuracy']}, {'end': 9097.494, 'segs': [{'end': 7678.132, 'src': 'embed', 'start': 7643.822, 'weight': 2, 'content': [{'end': 7649.084, 'text': 'I was able to pull data off that web page in a structured format and do a very simple analysis with it.', 'start': 7643.822, 'duration': 5.262}, {'end': 7652.654, 'text': "And let's sum up what we've learned from all of this.", 'start': 7650.672, 'duration': 1.982}, {'end': 7657.239, 'text': 'First off, APIs make it really easy to work with web data.', 'start': 7653.075, 'duration': 4.164}, {'end': 7662.304, 'text': 'They structure it, they call it for you, and then they feed it straight into the programs for you to analyze.', 'start': 7657.579, 'duration': 4.725}, {'end': 7667.59, 'text': "And they're one of the best ways of getting data and getting started in data science.", 'start': 7662.685, 'duration': 4.905}, {'end': 7678.132, 'text': "When you're looking for data, another great way of getting data is through scraping, and what that means is pulling information from web pages.", 'start': 7671.209, 'duration': 6.923}], 'summary': 'Apis make it easy to work with web data and are a great way to start in data science.', 'duration': 34.31, 'max_score': 7643.822, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj307643822.jpg'}, {'end': 7718.764, 'src': 'embed', 'start': 7688.636, 'weight': 3, 'content': [{'end': 7692.157, 'text': "Now, when you're dealing with scraping, you can get data in several different formats.", 'start': 7688.636, 'duration': 3.521}, {'end': 7698.76, 'text': 'You can get HTML text from web pages, you can get HTML tables, the rows and columns that appear on web pages.', 'start': 7692.617, 'duration': 6.143}, {'end': 7705.241, 'text': 'You can scrape data from PDFs and you can scrape data from all sorts of media like images and video and audio.', 'start': 7699.48, 'duration': 5.761}, {'end': 7710.282, 'text': "Now, we'll make one very important qualification before we say anything else.", 'start': 7706.101, 'duration': 4.181}, {'end': 7713.123, 'text': 'Pay attention to copyright and privacy.', 'start': 7711.022, 'duration': 2.101}, {'end': 7718.764, 'text': "Just because something is on the web doesn't mean you're allowed to pull it out.", 'start': 7713.663, 'duration': 5.101}], 'summary': 'Scraping can retrieve data from various formats including html, pdf, images, videos, and audio. copyright and privacy considerations are crucial.', 'duration': 30.128, 'max_score': 7688.636, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj307688636.jpg'}, {'end': 8647.112, 'src': 'embed', 'start': 8617.466, 'weight': 4, 'content': [{'end': 8624.067, 'text': "So regardless of how you got your sample, when they're in your study, you randomly assign them to one condition or another.", 'start': 8617.466, 'duration': 6.601}, {'end': 8628.989, 'text': 'And what that does is it balances out the pre-existing differences between groups.', 'start': 8624.488, 'duration': 4.501}, {'end': 8633.715, 'text': "And that's a great way of taking care of confounds and artifacts,", 'start': 8629.629, 'duration': 4.086}, {'end': 8640.083, 'text': 'the things that are unintentionally associated with differences between groups that provide alternate explanations for your data.', 'start': 8633.715, 'duration': 6.368}, {'end': 8647.112, 'text': "If you've done good random assignment and you have large enough people, then those confounds and artifacts are basically minimized.", 'start': 8640.543, 'duration': 6.569}], 'summary': 'Random assignment helps balance pre-existing differences between groups, minimizing confounds and artifacts in the data.', 'duration': 29.646, 'max_score': 8617.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj308617466.jpg'}, {'end': 8699.311, 'src': 'embed', 'start': 8671.367, 'weight': 0, 'content': [{'end': 8677.833, 'text': 'And in all of these, what you find is that experimental research is considered the gold standard for reliable,', 'start': 8671.367, 'duration': 6.466}, {'end': 8679.955, 'text': 'valid information about cause and effect.', 'start': 8677.833, 'duration': 2.122}, {'end': 8685.099, 'text': "On the other hand, while it's a wonderful thing to have, it does come at a cost.", 'start': 8680.655, 'duration': 4.444}, {'end': 8686.621, 'text': "Here's how that works.", 'start': 8685.74, 'duration': 0.881}, {'end': 8692.666, 'text': 'Number one, experimentation requires extensive specialized training.', 'start': 8687.241, 'duration': 5.425}, {'end': 8694.047, 'text': "It's not a simple thing to pick up.", 'start': 8692.746, 'duration': 1.301}, {'end': 8699.311, 'text': 'Two, experiments are often very time consuming and labor intensive.', 'start': 8695.268, 'duration': 4.043}], 'summary': 'Experimental research is the gold standard for cause and effect, but it requires extensive training and is time-consuming.', 'duration': 27.944, 'max_score': 8671.367, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj308671367.jpg'}, {'end': 8858.817, 'src': 'embed', 'start': 8828.227, 'weight': 1, 'content': [{'end': 8831.792, 'text': 'You can also look at shopping cart value or abandonment.', 'start': 8828.227, 'duration': 3.565}, {'end': 8834.055, 'text': 'A lot of possible outcomes.', 'start': 8832.273, 'duration': 1.782}, {'end': 8843.668, 'text': 'All of these contribute through A-B testing to the general concept of website optimization to make your website as effective as it can possibly be.', 'start': 8834.896, 'duration': 8.772}, {'end': 8849.55, 'text': "Now the idea also is that this is something you're going to do a lot.", 'start': 8846.587, 'duration': 2.963}, {'end': 8853.213, 'text': 'You can perform A-B tests continually.', 'start': 8850.21, 'duration': 3.003}, {'end': 8858.817, 'text': "In fact, I've seen one person say that what A-B testing really stands for is always B testing.", 'start': 8853.333, 'duration': 5.484}], 'summary': 'A-b testing contributes to website optimization for improved effectiveness. can be performed continually.', 'duration': 30.59, 'max_score': 8828.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj308828227.jpg'}, {'end': 9003.956, 'src': 'embed', 'start': 8974.532, 'weight': 5, 'content': [{'end': 8980.656, 'text': 'You can do surveys in person, And you can also do them online or over the mail or phone or however.', 'start': 8974.532, 'duration': 6.124}, {'end': 8985.2, 'text': "And now it's very common to use software when doing surveys.", 'start': 8981.657, 'duration': 3.543}, {'end': 8991.586, 'text': 'Some really common applications for online surveys are Survey Monkey and Qualtrics,', 'start': 8985.621, 'duration': 5.965}, {'end': 8996.55, 'text': "or at the very simple end there's Google Forms and at the simple and pretty end there's Typeform.", 'start': 8991.586, 'duration': 4.964}, {'end': 9003.956, 'text': "There's a lot more choices but these are some of the major players in how you can get data from online participants in survey format.", 'start': 8996.85, 'duration': 7.106}], 'summary': 'Conduct surveys in person or online using software like survey monkey, qualtrics, google forms, and typeform.', 'duration': 29.424, 'max_score': 8974.532, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj308974532.jpg'}], 'start': 7643.822, 'title': 'Data collection and analysis techniques', 'summary': 'Covers web data scraping, data sourcing techniques, laboratory experimentation, a-b testing, and surveys, emphasizing importance of apis, sensitivity to copyright and privacy, random assignments, and clear survey design.', 'chapters': [{'end': 7830.557, 'start': 7643.822, 'title': 'Web data scraping and analysis', 'summary': 'Explores the ease and importance of using apis for web data, the process and formats of web scraping, and the methods and tools available for scraping data, emphasizing the importance of copyright and privacy.', 'duration': 186.735, 'highlights': ['APIs make it really easy to work with web data, structuring and feeding it into programs for analysis.', 'Scraping allows obtaining data in various formats such as HTML text, tables, PDFs, images, and media, emphasizing the importance of respecting copyright and privacy.', 'The chapter discusses the tools and methods for scraping data, including using apps like import.io, Scraper Wiki, Tabula, and scraping in Google Sheets and Excel, as well as coding scrapers in languages like R, Python, Bash, Java, and PHP.']}, {'end': 8514.834, 'start': 7831.217, 'title': 'Data sourcing and scraping techniques', 'summary': 'Discusses scraping data from web pages using google sheets, as well as strategies for making new data through interviews and card sorting tasks, emphasizing the importance of sensitivity to copyright and privacy, and the various methods and considerations involved in each approach, while highlighting the benefits and challenges of each method.', 'duration': 683.617, 'highlights': ['Scraping data from web pages using Google Sheets', 'Strategies for making new data through interviews', 'Card sorting tasks for intuitive organization of information']}, {'end': 8750.104, 'start': 8518.157, 'title': 'Laboratory experimentation for causality', 'summary': 'Discusses the importance of laboratory experiments for determining cause and effect, emphasizing the need for random assignments to balance pre-existing differences between groups, and highlighting the time-consuming and expensive nature of experiments, which are considered the gold standard for reliable causality assessment.', 'duration': 231.947, 'highlights': ['Laboratory experiments are considered the gold standard for reliable, valid information about cause and effect in research in medicine, education, and psychology.', 'Random assignments in experiments balance out pre-existing differences between groups, minimizing confounds and artifacts associated with differences.', 'Experiments are often time-consuming and labor-intensive, with some taking hours per person, and can be very expensive.', 'Laboratory experimentation is generally considered the best method for assessing causality, but it requires extensive specialized training and justifying the costs for experimentation based on the importance of obtaining reliable cause and effect information.']}, {'end': 9097.494, 'start': 8750.224, 'title': 'A-b testing and surveys', 'summary': 'Covers the concept of a-b testing for website optimization, including the process, tools, and benefits, along with survey techniques and common software options, highlighting the importance of clear, unambiguous questions and bias avoidance in survey design.', 'duration': 347.27, 'highlights': ['A-B testing involves creating multiple versions of web elements and comparing response rates to optimize website design, with common outcomes including time on page, click-throughs, and shopping cart value.', 'A-B testing can be performed continually using software like Optimizely and VWO, emphasizing the constant process of improvement for website effectiveness.', 'Surveys can be conducted in various formats, such as closed-ended or open-ended, and with the help of software like Survey Monkey, Qualtrics, Google Forms, and Typeform, offering an easy way to gather data from large groups of people but requiring special effort to ensure clear, unambiguous questions and response scales.', "It's important to watch out for biased survey techniques, such as push polls, and to avoid bias in question wording, response options, and sample selection."]}], 'duration': 1453.672, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj307643822.jpg', 'highlights': ['Laboratory experiments are considered the gold standard for reliable, valid information about cause and effect in research in medicine, education, and psychology.', 'A-B testing involves creating multiple versions of web elements and comparing response rates to optimize website design, with common outcomes including time on page, click-throughs, and shopping cart value.', 'APIs make it really easy to work with web data, structuring and feeding it into programs for analysis.', 'Scraping allows obtaining data in various formats such as HTML text, tables, PDFs, images, and media, emphasizing the importance of respecting copyright and privacy.', 'Random assignments in experiments balance out pre-existing differences between groups, minimizing confounds and artifacts associated with differences.', 'Surveys can be conducted in various formats, such as closed-ended or open-ended, and with the help of software like Survey Monkey, Qualtrics, Google Forms, and Typeform, offering an easy way to gather data from large groups of people but requiring special effort to ensure clear, unambiguous questions and response scales.']}, {'end': 10745.782, 'segs': [{'end': 9260.861, 'src': 'embed', 'start': 9228.518, 'weight': 1, 'content': [{'end': 9229.559, 'text': 'Number one is spreadsheets.', 'start': 9228.518, 'duration': 1.041}, {'end': 9231.14, 'text': "It's the universal data tool.", 'start': 9229.599, 'duration': 1.541}, {'end': 9233.583, 'text': "And I'll talk about how they play an important role in data science.", 'start': 9231.16, 'duration': 2.423}, {'end': 9237.729, 'text': 'Number two is a visualization program called Tableau.', 'start': 9234.727, 'duration': 3.002}, {'end': 9242.971, 'text': "There's Tableau Public, which is free, and there's Tableau Desktop, and there's also something called Tableau Server.", 'start': 9237.769, 'duration': 5.202}, {'end': 9251.216, 'text': "But Tableau is a fabulous program for data visualization, and I'm convinced for most people, provides the great majority of what they need.", 'start': 9243.392, 'duration': 7.824}, {'end': 9257.139, 'text': "And though, while it's not a tool, I do need to talk about the formats used in web data,", 'start': 9252.316, 'duration': 4.823}, {'end': 9260.861, 'text': 'because you have to be able to navigate that when doing a lot of data science work.', 'start': 9257.139, 'duration': 3.722}], 'summary': 'Spreadsheets and tableau are vital for data science, with tableau being a key visualization tool.', 'duration': 32.343, 'max_score': 9228.518, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj309228518.jpg'}, {'end': 9401.741, 'src': 'embed', 'start': 9376.836, 'weight': 2, 'content': [{'end': 9385.158, 'text': "The important thing about this is you only have to go to the second tool, that's 2 out of 10, so that's B, that's 20% of your tools.", 'start': 9376.836, 'duration': 8.322}, {'end': 9390.699, 'text': "And in this made-up example, you've got 80% of your output.", 'start': 9386.118, 'duration': 4.581}, {'end': 9394.64, 'text': 'So 80% of the output from 20% of the tools.', 'start': 9391.179, 'duration': 3.461}, {'end': 9401.741, 'text': "That's a fictional example of the Pareto Principle, but I find in real life it tends to work something approximately like that.", 'start': 9394.7, 'duration': 7.041}], 'summary': 'Using the pareto principle, 20% of tools yield 80% output.', 'duration': 24.905, 'max_score': 9376.836, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj309376836.jpg'}, {'end': 9529.849, 'src': 'embed', 'start': 9496.625, 'weight': 0, 'content': [{'end': 9497.586, 'text': 'there are a few reasons for that.', 'start': 9496.625, 'duration': 0.961}, {'end': 9500.048, 'text': 'Number one spreadsheets.', 'start': 9498.386, 'duration': 1.662}, {'end': 9500.868, 'text': "they're everywhere.", 'start': 9500.048, 'duration': 0.82}, {'end': 9501.729, 'text': "They're ubiquitous.", 'start': 9500.908, 'duration': 0.821}, {'end': 9504.011, 'text': "They're installed on a billion machines around the world.", 'start': 9501.749, 'duration': 2.262}, {'end': 9506.733, 'text': 'And everybody uses them.', 'start': 9504.751, 'duration': 1.982}, {'end': 9511.317, 'text': 'They probably have more data sets in spreadsheets than anything else.', 'start': 9507.013, 'duration': 4.304}, {'end': 9514.019, 'text': "And so it's a very common format.", 'start': 9511.857, 'duration': 2.162}, {'end': 9517.142, 'text': "Importantly, it's probably your client's format.", 'start': 9514.74, 'duration': 2.402}, {'end': 9521.724, 'text': 'A lot of your clients are going to be using spreadsheets for their own data.', 'start': 9517.882, 'duration': 3.842}, {'end': 9525.306, 'text': "I've worked with billion dollar companies that keep all of their data in spreadsheets.", 'start': 9521.764, 'duration': 3.542}, {'end': 9529.849, 'text': "And so when you're working with them, you need to know how to manipulate that and how to work with it.", 'start': 9525.826, 'duration': 4.023}], 'summary': 'Spreadsheets are ubiquitous, installed on a billion machines worldwide, with many clients using them for their data.', 'duration': 33.224, 'max_score': 9496.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj309496625.jpg'}, {'end': 9804.756, 'src': 'embed', 'start': 9775.928, 'weight': 3, 'content': [{'end': 9781.409, 'text': "So this is the stuff that, while it may be useful for the client's own personal use,", 'start': 9775.928, 'duration': 5.481}, {'end': 9786.031, 'text': "you know you can't feed this into R or Python and it'll just choke and it won't know what to do with it.", 'start': 9781.409, 'duration': 4.622}, {'end': 9789.852, 'text': 'And so you need to go through a process of tidying up the data.', 'start': 9786.611, 'duration': 3.241}, {'end': 9793.953, 'text': 'And, what this involves is undoing some of this stuff.', 'start': 9790.932, 'duration': 3.021}, {'end': 9797.694, 'text': "So, for instance, here's data that is almost tidy.", 'start': 9794.413, 'duration': 3.281}, {'end': 9804.756, 'text': 'Here we have a single column for the date, a single column for the day, a column for the site.', 'start': 9798.334, 'duration': 6.422}], 'summary': 'Data needs tidying up to be usable in r or python, with single columns for date, day, and site.', 'duration': 28.828, 'max_score': 9775.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj309775928.jpg'}, {'end': 10001.032, 'src': 'embed', 'start': 9973.934, 'weight': 4, 'content': [{'end': 9977.957, 'text': "It's a free version of Tableau with one major caveat.", 'start': 9973.934, 'duration': 4.023}, {'end': 9982.74, 'text': "You don't save files locally to your computer, which is why I didn't give you a file to open.", 'start': 9978.497, 'duration': 4.243}, {'end': 9986.903, 'text': 'Instead, it saves them to the web in a public forum.', 'start': 9983.28, 'duration': 3.623}, {'end': 9995.609, 'text': "So if you're willing to trade privacy, you can get an immensely powerful application for data visualization.", 'start': 9987.523, 'duration': 8.086}, {'end': 10001.032, 'text': "That's a catch for a lot of people, which is why people are willing to pay a lot of money for the desktop version.", 'start': 9996.389, 'duration': 4.643}], 'summary': 'Free tableau version lacks local file saving but offers powerful data visualization, leading to paid desktop version demand.', 'duration': 27.098, 'max_score': 9973.934, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj309973934.jpg'}], 'start': 9097.494, 'title': 'Data science tools and techniques, 80-20 rule, and tidying data', 'summary': 'Discusses the importance of data tools in data science, emphasizing key tools like spreadsheets, tableau, r, python, and sql. it also highlights the 80-20 rule in data science, focusing on the practical applications of spreadsheets, and the significance of tidy data for analytical programs, along with the introduction to tableau public as a free visualization tool.', 'chapters': [{'end': 9308.144, 'start': 9097.494, 'title': 'Data science tools and techniques', 'summary': 'Discusses the importance of understanding the proper place of data tools in data science, emphasizing the need for familiarity with survey answers and the potential biases, and highlighting the essential tools for data science including spreadsheets, tableau for data visualization, and programming languages such as r, python, and sql.', 'duration': 210.65, 'highlights': ['The importance of understanding the proper place of data tools in data science, emphasizing the need for familiarity with survey answers and the potential biases.', 'The essential tools for data science include spreadsheets, Tableau for data visualization, R, Python, and SQL programming languages.', 'The need for familiarity with survey answers and the potential biases when using surveys to gather data for analysis.', 'The importance of exploring open data sources and considering new data if existing sources do not meet project requirements.', 'The significance of watching for bias to ensure that survey answers are representative of the concerned group.']}, {'end': 9775.108, 'start': 9309.065, 'title': '80-20 rule in data science', 'summary': 'Emphasizes the 80-20 rule, also known as the pareto principle, illustrating that 80% of output can be achieved by focusing on 20% of the tools, particularly highlighting the significance of spreadsheets in data science as demonstrated by their widespread use and practical applications, supported by a fictional example and practical uses of spreadsheets.', 'duration': 466.043, 'highlights': ['Spreadsheets are widely used and practical for data science, as they are installed on a billion machines globally, commonly used by clients, and serve as a universal data transfer format.', 'Illustration of the 80-20 rule using a fictional example, emphasizing that 80% of output can be achieved by focusing on 20% of the tools.', 'Practical uses of spreadsheets in data science, including data browsing, sorting, rearranging, finding and replacing, conditional formatting, transposing data, tracking changes, creating pivot tables, and arranging output for consumption.']}, {'end': 10745.782, 'start': 9775.928, 'title': 'Tidying data and tableau public', 'summary': 'Explains the process of tidying up data to make it suitable for analytical programs, emphasizes the significance of tidy data for exporting into an analytical program, and introduces tableau public as a powerful free tool for creating interactive visualizations.', 'duration': 969.854, 'highlights': ['Tidying data involves restructuring it to make it suitable for analytical programs, enabling easier importing and remanipulation.', 'Tableau Public is a free version of Tableau with powerful visualization capabilities, offering the ability to create interactive dashboards and compelling visualizations.', 'SPSS is a desktop program used in academic research, business consulting, management, and medical research, offering point-and-click analyses and graph creation capabilities.']}], 'duration': 1648.288, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj309097494.jpg', 'highlights': ['Spreadsheets are widely used and practical for data science, installed on a billion machines globally.', 'The essential tools for data science include spreadsheets, Tableau, R, Python, and SQL programming languages.', 'Illustration of the 80-20 rule, emphasizing that 80% of output can be achieved by focusing on 20% of the tools.', 'Tidying data involves restructuring it to make it suitable for analytical programs, enabling easier importing and remanipulation.', 'Tableau Public is a free version with powerful visualization capabilities, offering interactive dashboards and compelling visualizations.']}, {'end': 11668.358, 'segs': [{'end': 10900.605, 'src': 'embed', 'start': 10870.842, 'weight': 1, 'content': [{'end': 10872.623, 'text': "But it's a text file that can be opened in anything.", 'start': 10870.842, 'duration': 1.781}, {'end': 10876.807, 'text': "And what's beautiful about this is it's really easy to copy and paste.", 'start': 10873.164, 'duration': 3.643}, {'end': 10881.37, 'text': 'And you can even take this into a word and do find and replace on it.', 'start': 10877.047, 'duration': 4.323}, {'end': 10885.093, 'text': "And it's really easy to replicate the analyses.", 'start': 10881.55, 'duration': 3.543}, {'end': 10888.716, 'text': 'And so for me, SPSS is a good program.', 'start': 10885.153, 'duration': 3.563}, {'end': 10893.54, 'text': "But until you use syntax, you don't know the true power of it.", 'start': 10889.137, 'duration': 4.403}, {'end': 10896.182, 'text': 'And it makes your life so much easier as a way of operating it.', 'start': 10893.62, 'duration': 2.562}, {'end': 10900.605, 'text': 'Anyhow, this is my extremely brief introduction to SPSS.', 'start': 10897.043, 'duration': 3.562}], 'summary': 'Spss allows easy replication of analyses and enhances efficiency through syntax usage.', 'duration': 29.763, 'max_score': 10870.842, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3010870842.jpg'}, {'end': 11165.64, 'src': 'embed', 'start': 11137.615, 'weight': 2, 'content': [{'end': 11144.321, 'text': 'And that is that you can share the information online really well through a program called osf.io.', 'start': 11137.615, 'duration': 6.706}, {'end': 11147.664, 'text': 'That stands for the Open Science Foundation.', 'start': 11145.122, 'duration': 2.542}, {'end': 11149.906, 'text': "That's its web address osf.io.", 'start': 11147.704, 'duration': 2.202}, {'end': 11152.028, 'text': "So let's take a quick look at what that's like.", 'start': 11150.227, 'duration': 1.801}, {'end': 11155.331, 'text': "Here's the open science framework website.", 'start': 11153.149, 'duration': 2.182}, {'end': 11157.473, 'text': "And it's a wonderful service.", 'start': 11155.571, 'duration': 1.902}, {'end': 11158.073, 'text': "It's free.", 'start': 11157.533, 'duration': 0.54}, {'end': 11165.64, 'text': "And it's designed to support open, transparent, accessible, accountable, collaborative research.", 'start': 11158.194, 'duration': 7.446}], 'summary': 'Osf.io is a free service supporting open, transparent, and collaborative research.', 'duration': 28.025, 'max_score': 11137.615, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3011137615.jpg'}, {'end': 11342.548, 'src': 'embed', 'start': 11309.959, 'weight': 0, 'content': [{'end': 11312.301, 'text': 'still in beta, still growing rapidly.', 'start': 11309.959, 'duration': 2.342}, {'end': 11318.865, 'text': 'I see it really as an open source free and collaborative replacement to SPSS.', 'start': 11312.961, 'duration': 5.904}, {'end': 11323.348, 'text': "And I think it's going to make data science work so much easier for so many people.", 'start': 11319.285, 'duration': 4.063}, {'end': 11326.47, 'text': 'I strongly recommend you give JASP a close look.', 'start': 11323.768, 'duration': 2.702}, {'end': 11336.486, 'text': "Let's finish up our discussion of coding and data science, the applications part of it by just briefly looking at some other software choices.", 'start': 11328.883, 'duration': 7.603}, {'end': 11342.548, 'text': "And I'll have to admit, it gets kind of overwhelming because there are just so many choices.", 'start': 11337.066, 'duration': 5.482}], 'summary': 'Jasp, an open source software, is rapidly growing as a collaborative replacement to spss, making data science easier.', 'duration': 32.589, 'max_score': 11309.959, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3011309959.jpg'}, {'end': 11498.76, 'src': 'embed', 'start': 11474.082, 'weight': 3, 'content': [{'end': 11480.807, 'text': "Now several applications that are more specifically geared towards data mining, so you don't want to do your regular,", 'start': 11474.082, 'duration': 6.725}, {'end': 11483.009, 'text': 'you know little t tests and stuff on these.', 'start': 11480.807, 'duration': 2.202}, {'end': 11484.77, 'text': "But there's rapid miner.", 'start': 11483.449, 'duration': 1.321}, {'end': 11488.533, 'text': "And there's nine and orange.", 'start': 11484.79, 'duration': 3.743}, {'end': 11493.617, 'text': 'And those are all really nice to use because they are control languages,', 'start': 11489.093, 'duration': 4.524}, {'end': 11498.76, 'text': 'where you drag nodes onto a screen and you connect them with lines and you can see how things run through.', 'start': 11493.617, 'duration': 5.143}], 'summary': 'Applications like rapidminer, knime, and orange are control languages for data mining.', 'duration': 24.678, 'max_score': 11474.082, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3011474082.jpg'}], 'start': 10748.603, 'title': 'Statistical analysis software and data mining tools', 'summary': 'Introduces spss and jasp, highlighting the features and potential of jasp as a free alternative to spss, discusses data sharing using osf.io, explores alternative software choices, and provides an overview of free and affordable data mining tools including rapidminer and knime.', 'chapters': [{'end': 10980.629, 'start': 10748.603, 'title': 'Introduction to spss and jasp', 'summary': 'Introduces the features of spss, emphasizing the use of syntax for easier replication of analyses, and highlights the potential of jasp, an open-source and intuitive program with bayesian approaches, as a free alternative to spss.', 'duration': 232.026, 'highlights': ['SPSS can be used with drop-down menus and text-based syntax commands, making it easier to replicate analyses in the future.', 'JASP is a free, open-source program with Bayesian approaches, providing a promising alternative to SPSS.', 'Saving commands in SPSS through syntax files simplifies the replication of analyses and makes the program more powerful and convenient.', 'SPSS offers options to export information as images, HTML, PDF, or PowerPoint, providing flexibility in sharing and presenting results.', 'The chapter emphasizes the convenience and power of using syntax commands in SPSS for easier replication of analyses.']}, {'end': 11137.175, 'start': 10980.77, 'title': 'Jasp: user-friendly statistical analysis', 'summary': 'Introduces jasp, a user-friendly statistical analysis software that offers a low-fat alternative to spss, includes a familiar layout, and provides a seamless experience for data analysis, allowing for easy modification and customization of analyses.', 'duration': 156.405, 'highlights': ['JASP offers a low-fat alternative to SPSS, making it a user-friendly and efficient statistical analysis software.', 'The software provides a familiar layout similar to SPSS, allowing for easy navigation and utilization for those familiar with SPSS.', 'JASP allows for seamless modification and customization of analyses, including the ability to easily add additional statistics and plots to the output.']}, {'end': 11472.181, 'start': 11137.615, 'title': 'Sharing data and software choices', 'summary': 'Discusses sharing data using osf.io, emphasizing its open, transparent, accessible, and collaborative features, and highlights the potential of jasp as a free and collaborative replacement to spss. additionally, it briefly explores alternative software choices including sas, stata, minitab, matlab, mathematica, and wolfram alpha.', 'duration': 334.566, 'highlights': ['The Open Science Framework (osf.io) is a free and collaborative platform designed to support open, transparent, accessible, and accountable research, allowing users to share and collaborate on research findings, and it is highly recommended by the speaker.', 'JASP is highlighted as an open-source, free, and collaborative replacement to SPSS, with the potential to make data science work easier for many people, and the speaker strongly recommends giving it a close look.', 'SAS is mentioned as a common and powerful analytical program, with the SAS University Edition available for students for free, and the program Jump is noted as a visualization software, similar to Tableau, albeit with a high cost.', 'Other software choices including Stata, Minitab, MATLAB, Mathematica, and Wolfram Alpha are briefly mentioned, with Wolfram Alpha highlighted for its capabilities in analyses, regression models, and visualizations, especially with the pro account.']}, {'end': 11668.358, 'start': 11474.082, 'title': 'Data mining tools overview', 'summary': 'Provides an overview of free and affordable data mining tools, including rapidminer, knime, orange, bigml, sofa statistics, past3, statcrunch, and excel stat, emphasizing the importance of functionality, ease of use, community support, and cost in selecting the right tool for individual projects.', 'duration': 194.276, 'highlights': ['The chapter provides an overview of free and affordable data mining tools, including RapidMiner, KNIME, Orange, BigML, Sofa Statistics, Past3, StatCrunch, and Excel Stat.', 'Emphasizes the importance of functionality, ease of use, community support, and cost in selecting the right tool for individual projects.', 'Highlights the unique features of BigML, a browser-based machine learning tool that runs on their servers and offers a free version.']}], 'duration': 919.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3010748603.jpg', 'highlights': ['JASP is a free, open-source program with Bayesian approaches, providing a promising alternative to SPSS.', 'SPSS can be used with drop-down menus and text-based syntax commands, making it easier to replicate analyses in the future.', 'The Open Science Framework (osf.io) is a free and collaborative platform designed to support open, transparent, accessible, and accountable research.', 'The chapter provides an overview of free and affordable data mining tools, including RapidMiner, KNIME, Orange, BigML, Sofa Statistics, Past3, StatCrunch, and Excel Stat.']}, {'end': 12816.852, 'segs': [{'end': 11702.253, 'src': 'embed', 'start': 11668.698, 'weight': 1, 'content': [{'end': 11671.2, 'text': "So you don't buy them unless somebody else is paying for it.", 'start': 11668.698, 'duration': 2.502}, {'end': 11675.983, 'text': "So these are some of the things that you want to keep in mind when you're trying to look at various programs.", 'start': 11671.88, 'duration': 4.103}, {'end': 11681.11, 'text': "Also, let's mention this don't forget the 8020 rule.", 'start': 11676.944, 'duration': 4.166}, {'end': 11686.937, 'text': "you're going to be able to do most of the stuff that you need to do with only a small number of tools.", 'start': 11681.11, 'duration': 5.827}, {'end': 11690.822, 'text': 'one or two, maybe three, will probably be all that you ever need.', 'start': 11686.937, 'duration': 3.885}, {'end': 11694.406, 'text': "So you don't need to explore the range of every possible tool.", 'start': 11690.842, 'duration': 3.564}, {'end': 11702.253, 'text': "find something that does what you need, find something you're comfortable with, and really try to extract as much value as you can out of that.", 'start': 11695.507, 'duration': 6.746}], 'summary': 'Focus on using a small number of tools to maximize value and efficiency.', 'duration': 33.555, 'max_score': 11668.698, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3011668698.jpg'}, {'end': 12014.116, 'src': 'embed', 'start': 11984.352, 'weight': 3, 'content': [{'end': 11992.417, 'text': "HTML defines the structure of a webpage, but if they're feeding data into it, then that will often come in the form of an XML file.", 'start': 11984.352, 'duration': 8.065}, {'end': 12003.905, 'text': "Interestingly, Microsoft Office files, if you have docx or xlsx, x part at the end stands for a version of XML that's used to create these documents.", 'start': 11993.378, 'duration': 10.527}, {'end': 12014.116, 'text': "If you use iTunes, the library information that has all of your artists and your genres and your ratings and stuff, that's all stored in an XML file.", 'start': 12004.745, 'duration': 9.371}], 'summary': 'Html defines webpage structure; data often in xml format, e.g., microsoft office and itunes library info.', 'duration': 29.764, 'max_score': 11984.352, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3011984352.jpg'}, {'end': 12234.702, 'src': 'embed', 'start': 12207.441, 'weight': 2, 'content': [{'end': 12215.307, 'text': 'Because you can think of HTML with its restricted set of tags as sort of a subset of the much freer XML.', 'start': 12207.441, 'duration': 7.866}, {'end': 12222.112, 'text': 'And three, you can convert CSV or your spreadsheet comma separated value to XML and vice versa.', 'start': 12216.328, 'duration': 5.784}, {'end': 12226.976, 'text': "You can bounce them all back and forth because the structure is made clear to the programs that you're working with.", 'start': 12222.433, 'duration': 4.543}, {'end': 12230.918, 'text': "So, in sum, here's what we can say.", 'start': 12228.536, 'duration': 2.382}, {'end': 12234.702, 'text': 'Number one, XML is semi-structured data.', 'start': 12231.439, 'duration': 3.263}], 'summary': 'Html is a subset of xml. xml allows conversion of csv data. xml is semi-structured data.', 'duration': 27.261, 'max_score': 12207.441, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3012207441.jpg'}, {'end': 12677.64, 'src': 'heatmap', 'start': 12466.178, 'weight': 1, 'content': [{'end': 12469.401, 'text': 'So it says series in quotes, and then a colon.', 'start': 12466.178, 'duration': 3.223}, {'end': 12474.005, 'text': 'And then it gives the piece of information in quotes and a comma and it moves on to the next one.', 'start': 12469.821, 'duration': 4.184}, {'end': 12480.569, 'text': 'And this is a lot more similar to the way that data would be represented in something like R or Python.', 'start': 12474.805, 'duration': 5.764}, {'end': 12483.771, 'text': "And so it's also more compact.", 'start': 12481.209, 'duration': 2.562}, {'end': 12486.452, 'text': "Again, there's things you can do with XML.", 'start': 12484.651, 'duration': 1.801}, {'end': 12492.556, 'text': 'But this is one of the reasons that JSON is generally becoming preferred as a data carrier for websites.', 'start': 12486.912, 'duration': 5.644}, {'end': 12498.048, 'text': "And as you might have guessed, it's really easy to convert between the formats.", 'start': 12494.065, 'duration': 3.983}, {'end': 12503.811, 'text': "It's easy to convert between XML, JSON, CSV, etc.", 'start': 12499.088, 'duration': 4.723}, {'end': 12508.834, 'text': 'And so you can get a web page where you just paste one version in and you get the other version out.', 'start': 12504.371, 'duration': 4.463}, {'end': 12513.597, 'text': "There are some differences, but for the vast majority of situations, they're just kind of interchangeable.", 'start': 12509.314, 'duration': 4.283}, {'end': 12516.851, 'text': 'So, in sum, what do we get from this?', 'start': 12514.71, 'duration': 2.141}, {'end': 12523.832, 'text': 'Like XML, JSON is semi-structured data where there are tags that say what the information is.', 'start': 12517.971, 'duration': 5.861}, {'end': 12525.493, 'text': 'but you can define the tags, however you want.', 'start': 12523.832, 'duration': 1.661}, {'end': 12534.655, 'text': 'And JSON is specifically designed for data interchange, and because it reflects the structure of the data in the programs, that makes it really easy.', 'start': 12526.133, 'duration': 8.522}, {'end': 12545.688, 'text': "And then also because it's relatively compact, JSON is replacing gradually XML on the web as a container for data on web pages.", 'start': 12535.595, 'duration': 10.093}, {'end': 12554.756, 'text': "If we're going to talk about coding and data science and the languages that are used, then first and foremost is R.", 'start': 12548.01, 'duration': 6.746}, {'end': 12560.721, 'text': 'The reason for that is, according to many standards, R is the language of data and data science.', 'start': 12554.756, 'duration': 5.965}, {'end': 12562.923, 'text': 'For example, take a look at this chart.', 'start': 12561.422, 'duration': 1.501}, {'end': 12569.709, 'text': 'This is a ranking based on a survey of data mining experts of the software that they use in doing their work.', 'start': 12563.084, 'duration': 6.625}, {'end': 12573.507, 'text': 'And R is right there at the top, R is first.', 'start': 12570.746, 'duration': 2.761}, {'end': 12581.411, 'text': "And in fact, that's important because there's Python, which is usually taken hand in hand with R for data science.", 'start': 12574.228, 'duration': 7.183}, {'end': 12586.994, 'text': 'But R sees 50% more use than Python does, at least in this particular list.', 'start': 12582.191, 'duration': 4.803}, {'end': 12590.035, 'text': "Now there's a few reasons for that popularity.", 'start': 12587.834, 'duration': 2.201}, {'end': 12595.438, 'text': "Number one, R is free and it's open source, both of which make things very easy.", 'start': 12590.215, 'duration': 5.223}, {'end': 12599.24, 'text': 'Second, R is specially developed for vector operations.', 'start': 12596.519, 'duration': 2.721}, {'end': 12603.342, 'text': "That means it's able to go through an entire list of data without having to write for loops to go through.", 'start': 12599.28, 'duration': 4.062}, {'end': 12608.545, 'text': "If you've ever had to write for loops, you know that that would be kind of disastrous having to do that with data analysis.", 'start': 12603.362, 'duration': 5.183}, {'end': 12612.286, 'text': 'Next. R has a fabulous community behind it.', 'start': 12609.665, 'duration': 2.621}, {'end': 12615.047, 'text': "it's very easy to get help on things with R.", 'start': 12612.286, 'duration': 2.761}, {'end': 12615.687, 'text': 'you Google it.', 'start': 12615.047, 'duration': 0.64}, {'end': 12618.928, 'text': "you're going to end up in a place where you're going to be able to find good examples of what you need.", 'start': 12615.687, 'duration': 3.241}, {'end': 12623.129, 'text': 'And probably most importantly, R is very capable on its own.', 'start': 12619.828, 'duration': 3.301}, {'end': 12631.211, 'text': 'But there are 7000 packages actually many more than that 7000 packages that add capabilities to R.', 'start': 12623.149, 'duration': 8.062}, {'end': 12632.611, 'text': 'Essentially, it can do anything.', 'start': 12631.211, 'duration': 1.4}, {'end': 12637.659, 'text': "Now, when you're working with R, you actually have a choice of interfaces.", 'start': 12633.857, 'duration': 3.802}, {'end': 12645.203, 'text': 'That is how do you actually do the coding and how do you get your results? R comes with its own IDE or interactive development environment.', 'start': 12637.679, 'duration': 7.524}, {'end': 12646.964, 'text': 'You can do that.', 'start': 12646.124, 'duration': 0.84}, {'end': 12652.067, 'text': "Or if you're on a Mac or Linux, you can actually do R through the terminal through a command line.", 'start': 12647.324, 'duration': 4.743}, {'end': 12654.769, 'text': "If you've installed R, you just type R and it starts up.", 'start': 12652.247, 'duration': 2.522}, {'end': 12659.831, 'text': "There's also a very popular development environment called RStudio.", 'start': 12655.649, 'duration': 4.182}, {'end': 12662.632, 'text': "And that's actually the one that I use, and I'll be using for all my examples.", 'start': 12659.851, 'duration': 2.781}, {'end': 12669.496, 'text': 'But another new competitor is Jupiter, which very commonly used for Python is what I use for examples there.', 'start': 12663.293, 'duration': 6.203}, {'end': 12672.617, 'text': "It works in a browser window, even though it's locally installed.", 'start': 12670.176, 'duration': 2.441}, {'end': 12677.64, 'text': "And RStudio and Jupiter, there's pluses and minuses to each one of them, I'll mention them as we get to them.", 'start': 12673.418, 'duration': 4.222}], 'summary': 'Json is preferred for web data, r is top language for data science with 7000+ packages and multiple interfaces like rstudio and jupyter.', 'duration': 211.462, 'max_score': 12466.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3012466178.jpg'}, {'end': 12534.655, 'src': 'embed', 'start': 12504.371, 'weight': 4, 'content': [{'end': 12508.834, 'text': 'And so you can get a web page where you just paste one version in and you get the other version out.', 'start': 12504.371, 'duration': 4.463}, {'end': 12513.597, 'text': "There are some differences, but for the vast majority of situations, they're just kind of interchangeable.", 'start': 12509.314, 'duration': 4.283}, {'end': 12516.851, 'text': 'So, in sum, what do we get from this?', 'start': 12514.71, 'duration': 2.141}, {'end': 12523.832, 'text': 'Like XML, JSON is semi-structured data where there are tags that say what the information is.', 'start': 12517.971, 'duration': 5.861}, {'end': 12525.493, 'text': 'but you can define the tags, however you want.', 'start': 12523.832, 'duration': 1.661}, {'end': 12534.655, 'text': 'And JSON is specifically designed for data interchange, and because it reflects the structure of the data in the programs, that makes it really easy.', 'start': 12526.133, 'duration': 8.522}], 'summary': 'Json enables easy data interchange, reflecting data structure in programs.', 'duration': 30.284, 'max_score': 12504.371, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3012504371.jpg'}, {'end': 12581.411, 'src': 'embed', 'start': 12554.756, 'weight': 0, 'content': [{'end': 12560.721, 'text': 'The reason for that is, according to many standards, R is the language of data and data science.', 'start': 12554.756, 'duration': 5.965}, {'end': 12562.923, 'text': 'For example, take a look at this chart.', 'start': 12561.422, 'duration': 1.501}, {'end': 12569.709, 'text': 'This is a ranking based on a survey of data mining experts of the software that they use in doing their work.', 'start': 12563.084, 'duration': 6.625}, {'end': 12573.507, 'text': 'And R is right there at the top, R is first.', 'start': 12570.746, 'duration': 2.761}, {'end': 12581.411, 'text': "And in fact, that's important because there's Python, which is usually taken hand in hand with R for data science.", 'start': 12574.228, 'duration': 7.183}], 'summary': 'R is ranked first by data mining experts, making it crucial for data and data science.', 'duration': 26.655, 'max_score': 12554.756, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3012554756.jpg'}], 'start': 11668.698, 'title': 'Choosing tools and working with html, xml, json, and r in data science', 'summary': 'Emphasizes the 8020 rule for tool selection in data science, the significance of html and xml, and the popularity of r in data science. it covers the importance of understanding html and xml, their applications in web data, and the comparison of xml and json as semi-structured data.', 'chapters': [{'end': 11729.612, 'start': 11668.698, 'title': 'Choosing the right tools for data science', 'summary': 'Emphasizes the importance of the 8020 rule in selecting a small number of tools to efficiently perform tasks, and the significance of finding tools that meet your needs and provide value for coding and data science applications.', 'duration': 60.914, 'highlights': ['The 8020 rule suggests that most tasks can be accomplished with a small number of tools, typically one or two, and occasionally three, reducing the need to explore a wide range of tools.', 'Emphasizes the importance of finding tools that meet your needs and provide value, rather than pursuing every possible tool available for coding and data science.', 'The chapter underscores the significance of selecting tools based on individual comfort and the ability to extract value, highlighting the importance of personal preferences and goals in the decision-making process.']}, {'end': 12096.472, 'start': 11729.612, 'title': 'Working with html and xml in data science', 'summary': 'Emphasizes the importance of understanding html and xml in data science, highlighting that html defines web page structure and content, while xml provides flexibility in defining data, with examples of applications in web data, microsoft office files, itunes library, and program data files.', 'duration': 366.86, 'highlights': ['HTML defines the page structure and content, while XML provides flexibility in defining data', 'Examples of applications of XML in web data, Microsoft Office files, iTunes library, and program data files', "Explanation of XML's semi-structured data nature and its use of opening and closing angle brackets for tags", 'Description of HTML as the language that makes the World Wide Web function', 'Demonstration of a quick data set example from airgast.com to illustrate XML usage']}, {'end': 12816.852, 'start': 12097.572, 'title': 'Web data and coding: xml, json, and r', 'summary': 'Discusses the structured xml data used in web pages, the flexibility of converting xml to csv and html, the comparison of xml and json as semi-structured data, and the popularity of r in data science based on its survey ranking and key features.', 'duration': 719.28, 'highlights': ['R is the top language for data science, as per a survey of data mining experts, with 50% more usage than Python.', 'XML data can be easily converted to CSV, HTML, and vice versa, providing flexibility in data manipulation.', 'JSON, like XML, is semi-structured data and is gradually replacing XML on the web as a data carrier.', 'The structured XML data is displayed in web pages by default, and APIs can be used to access and work with the XML data.']}], 'duration': 1148.154, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3011668698.jpg', 'highlights': ['R is the top language for data science, with 50% more usage than Python.', 'The 8020 rule suggests that most tasks can be accomplished with a small number of tools.', 'XML data can be easily converted to CSV, HTML, and vice versa, providing flexibility in data manipulation.', 'HTML defines the page structure and content, while XML provides flexibility in defining data.', 'JSON, like XML, is semi-structured data and is gradually replacing XML on the web as a data carrier.']}, {'end': 14270.816, 'segs': [{'end': 12939.477, 'src': 'embed', 'start': 12911.956, 'weight': 2, 'content': [{'end': 12917.278, 'text': 'Also, Python has a fabulous community around it with hundreds of 1000s of people involved.', 'start': 12911.956, 'duration': 5.322}, {'end': 12920.519, 'text': 'And also Python has 1000s of packages.', 'start': 12917.898, 'duration': 2.621}, {'end': 12923.66, 'text': 'Now it actually has something like 70 or 80, 000 packages.', 'start': 12920.939, 'duration': 2.721}, {'end': 12930.946, 'text': 'But in terms of ones that are specific to data, There are still thousands available that give it some incredible capabilities.', 'start': 12923.7, 'duration': 7.246}, {'end': 12934.27, 'text': 'Now, a couple of things to know about Python.', 'start': 12932.628, 'duration': 1.642}, {'end': 12935.932, 'text': 'First is about versions.', 'start': 12934.611, 'duration': 1.321}, {'end': 12939.477, 'text': 'There are two versions of Python that are in wide circulation.', 'start': 12936.393, 'duration': 3.084}], 'summary': 'Python has a large community and 70,000+ packages, with thousands specific to data.', 'duration': 27.521, 'max_score': 12911.956, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3012911956.jpg'}, {'end': 13086.285, 'src': 'embed', 'start': 13062.153, 'weight': 1, 'content': [{'end': 13070.998, 'text': "Python is familiar to millions of coders because it's very often a first programming language that people learn to do general purpose programming.", 'start': 13062.153, 'duration': 8.845}, {'end': 13077.201, 'text': 'And there are a lot of very simple adaptations for data that make it very powerful for data science work.', 'start': 13071.398, 'duration': 5.803}, {'end': 13079.983, 'text': 'So let me say something else.', 'start': 13078.462, 'duration': 1.521}, {'end': 13083.044, 'text': 'Again, data science loves Jupiter.', 'start': 13080.023, 'duration': 3.021}, {'end': 13086.285, 'text': 'And Jupiter is the browser based framework.', 'start': 13083.524, 'duration': 2.761}], 'summary': 'Python is popular among coders as a first language, and its simple adaptations for data make it powerful for data science work. jupiter is favored in data science for its browser-based framework.', 'duration': 24.132, 'max_score': 13062.153, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3013062153.jpg'}, {'end': 13208.776, 'src': 'embed', 'start': 13179.194, 'weight': 0, 'content': [{'end': 13181.476, 'text': 'So in sum, we can say a few things.', 'start': 13179.194, 'duration': 2.282}, {'end': 13186.579, 'text': 'Number one, Python is a very popular program familiar to millions of people.', 'start': 13182.176, 'duration': 4.403}, {'end': 13187.64, 'text': 'And that makes it a good choice.', 'start': 13186.599, 'duration': 1.041}, {'end': 13192.964, 'text': 'Second, of all the languages we use for data science on a frequent basis.', 'start': 13188.521, 'duration': 4.443}, {'end': 13198.308, 'text': "this is the only one that's general purpose, which means it can be used for a lot of things other than processing data.", 'start': 13192.964, 'duration': 5.344}, {'end': 13206.714, 'text': 'And it gets its power, like our does, from having 1000s of contributed packages, which greatly expand its capabilities,', 'start': 13199.169, 'duration': 7.545}, {'end': 13208.776, 'text': 'especially in terms of doing data science work.', 'start': 13206.714, 'duration': 2.062}], 'summary': 'Python is popular, versatile, and has thousands of packages, making it a great choice for data science.', 'duration': 29.582, 'max_score': 13179.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3013179194.jpg'}, {'end': 13482.127, 'src': 'embed', 'start': 13450.031, 'weight': 3, 'content': [{'end': 13452.052, 'text': 'You want to select it, you want to organize it.', 'start': 13450.031, 'duration': 2.021}, {'end': 13460.235, 'text': "And then what you're going to do is you're going to send the data to your program of choice for further analysis like R or Python or whatever.", 'start': 13452.572, 'duration': 7.663}, {'end': 13463.677, 'text': "So in sum, here's what we can say about SQL.", 'start': 13461.176, 'duration': 2.501}, {'end': 13473.462, 'text': "Number one, as a language, it's generally associated with relational databases, which are very efficient and well structured ways of storing data.", 'start': 13464.537, 'duration': 8.925}, {'end': 13477.744, 'text': 'Just a handful of basic commands can be extremely useful.', 'start': 13474.383, 'duration': 3.361}, {'end': 13482.127, 'text': "When working with databases, you don't have to be a super ninja expert, really.", 'start': 13478.125, 'duration': 4.002}], 'summary': 'Sql is associated with relational databases, efficient for data storage, and requires only basic commands for use.', 'duration': 32.096, 'max_score': 13450.031, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3013450031.jpg'}, {'end': 13661.455, 'src': 'embed', 'start': 13633.116, 'weight': 4, 'content': [{'end': 13638.139, 'text': 'the work is on the front end or closer to the high level languages like R or Python.', 'start': 13633.116, 'duration': 5.023}, {'end': 13647.825, 'text': 'In sum, C, C++ and Java form a foundational bedrock and the back end of data and data science.', 'start': 13639.199, 'duration': 8.626}, {'end': 13652.188, 'text': "And they do this because they're very fast, and they are very reliable.", 'start': 13648.386, 'duration': 3.802}, {'end': 13661.455, 'text': 'On the other hand, given their nature, that work is typically reserved for the engineers who are working with the equipment that runs in the back.', 'start': 13653.009, 'duration': 8.446}], 'summary': 'C, c++, and java are foundational for back-end data work, known for speed and reliability.', 'duration': 28.339, 'max_score': 13633.116, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3013633116.jpg'}, {'end': 13702.313, 'src': 'embed', 'start': 13675.873, 'weight': 6, 'content': [{'end': 13685.022, 'text': 'And bash really is a great example of old tools that have survived and are still being used actively and productively with new data.', 'start': 13675.873, 'duration': 9.149}, {'end': 13687.341, 'text': 'You can think of it this way.', 'start': 13686.36, 'duration': 0.981}, {'end': 13689.042, 'text': "It's almost like typing on your typewriter.", 'start': 13687.581, 'duration': 1.461}, {'end': 13691.044, 'text': "You're working at the command line.", 'start': 13689.463, 'duration': 1.581}, {'end': 13695.287, 'text': "You're typing out code through a command line interface or CLI.", 'start': 13691.064, 'duration': 4.223}, {'end': 13702.313, 'text': 'Now, this method of interacting with computers practically goes back to the typewriter phase because it predates monitors.', 'start': 13696.028, 'duration': 6.285}], 'summary': 'Bash, an old tool, is still actively used for coding in cli, predating monitors.', 'duration': 26.44, 'max_score': 13675.873, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3013675873.jpg'}, {'end': 14020.578, 'src': 'embed', 'start': 13975.934, 'weight': 7, 'content': [{'end': 13982.318, 'text': "So let's say this in sum, Despite being in one sense as old as the dinosaurs, the command line survives,", 'start': 13975.934, 'duration': 6.384}, {'end': 13988.579, 'text': 'because it is extremely well evolved and well suited to its purpose of working with data.', 'start': 13982.318, 'duration': 6.261}, {'end': 13993.78, 'text': 'The utilities, both the built in and the installables are fast, and they are easy.', 'start': 13989.199, 'duration': 4.581}, {'end': 13996.821, 'text': 'And generally, they do one thing and they do it very, very well.', 'start': 13994.1, 'duration': 2.721}, {'end': 14005.503, 'text': 'And then, surprisingly, there is an enormous amount of very active development of command line utilities for these purposes,', 'start': 13997.261, 'duration': 8.242}, {'end': 14007.183, 'text': 'especially with data science.', 'start': 14005.503, 'duration': 1.68}, {'end': 14014.733, 'text': "One critical task when you're coding in data science is to be able to find the things that you're looking for.", 'start': 14009.969, 'duration': 4.764}, {'end': 14020.578, 'text': 'And regex, which is short for regular expressions, is a wonderful way to do that.', 'start': 14015.354, 'duration': 5.224}], 'summary': 'Command line utilities are well-evolved, fast, and actively developed, especially for data science tasks like using regex for finding information.', 'duration': 44.644, 'max_score': 13975.934, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3013975934.jpg'}], 'start': 12816.952, 'title': 'Python, jupyter, sql, c/c++, java in data science', 'summary': 'Discusses the significance of python, its popularity, versatility, and community support with over 70,000 packages, its use in jupyter for data science, the importance of sql, and the usage of c, c++, and java for production-level code in data science applications.', 'chapters': [{'end': 12983.044, 'start': 12816.952, 'title': 'Python in data science', 'summary': 'Discusses the significance of python in data science, highlighting its popularity, versatility, and community support, with over 70,000 packages, its general-purpose nature, and the compatibility issues between python 2.x and 3.x, which leads to widespread use of the former.', 'duration': 166.092, 'highlights': ["Python's popularity and versatility", 'Compatibility issues between Python 2.x and 3.x', "Significance of Python's community and packages"]}, {'end': 13607.862, 'start': 12983.744, 'title': 'Python, jupyter, sql, c/c++, java in data science', 'summary': 'Discusses the use of python in jupyter for data science, the popularity and adaptability of python, the importance of sql in data science, and the usage of c, c++, and java for fast and stable production-level code in data science applications.', 'duration': 624.118, 'highlights': ['Python and Jupyter are popular choices for data science, with specialized distributions like Continuum Anaconda and Enthought making it easy to work with data.', 'Python is a popular, general-purpose programming language, familiar to millions, and offers numerous packages for data science, including NumPy, SciPy, Matplotlib, Seaborn, and scikit-learn.', 'SQL is essential in data science for working with relational databases, and common choices include Oracle Database, Microsoft SQL Server, MySQL, and PostgreSQL.', 'C, C++, and Java are utilized in data science for fast and stable production-level code, with C being benchmarked for speed and Java being the most popular overall programming language.']}, {'end': 13890.384, 'start': 13607.862, 'title': 'Coding in data science', 'summary': 'Discusses the use of foundational languages like c, c++, and java in the back end of data science, the importance of bash for command line interaction, and the specific utilities used in bash, such as cat, awk, grep, and sed.', 'duration': 282.522, 'highlights': ['The foundational languages C, C++, and Java are used in the back end of data science due to their speed and reliability.', 'Bash, an old tool, is still actively being used for command line interaction, allowing users to run different languages at the command line.', 'Specific utilities in bash, such as cat, awk, grep, and SED, are used for text processing, searching, and transforming text.']}, {'end': 14270.816, 'start': 13890.384, 'title': 'Command line utilities for data science', 'summary': 'Discusses the use of command line utilities for data science, emphasizing the significance of regex in searching and manipulating text data, and highlighting the active development of new utilities for the command line.', 'duration': 380.432, 'highlights': ['The command line utilities for data science are evolving and well-suited to its purpose, with both built-in and installable options, such as jq for JSON data, JSON to CSV conversion, Rio for statistical commands, and MLR for accessing machine learning servers.', 'The use of regex in data science is highlighted, with examples of regex patterns for searching and manipulating text data, demonstrating its utility in finding and extracting specific patterns within text data.', 'The introduction of a regex golf game website, regex.alf.nu, is mentioned as a platform for practicing and improving regex skills in a gamified format, providing a practical and interactive approach to learning regex.']}], 'duration': 1453.864, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3012816952.jpg', 'highlights': ["Python's popularity and versatility", 'Python and Jupyter are popular choices for data science', "Significance of Python's community and over 70,000 packages", 'SQL is essential in data science for working with relational databases', 'C, C++, and Java are utilized in data science for fast and stable production-level code', 'The foundational languages C, C++, and Java are used in the back end of data science', 'Bash is still actively being used for command line interaction', 'Command line utilities for data science are evolving and well-suited to its purpose', 'The use of regex in data science is highlighted']}, {'end': 16575.747, 'segs': [{'end': 14293.495, 'src': 'embed', 'start': 14271.076, 'weight': 1, 'content': [{'end': 14280.485, 'text': "And it's a great way of learning how to do regular expressions and learning how to search in a way that's going to get you the data that you need for your projects.", 'start': 14271.076, 'duration': 9.409}, {'end': 14287.21, 'text': 'So in sum, regex or regular expressions help you find the right data for your project.', 'start': 14281.165, 'duration': 6.045}, {'end': 14290.053, 'text': "They're very powerful, and they're very flexible.", 'start': 14287.631, 'duration': 2.422}, {'end': 14293.495, 'text': 'Now on the other hand, they are cryptic, at least when you first look at them.', 'start': 14290.513, 'duration': 2.982}], 'summary': 'Regular expressions are powerful and flexible for data search in projects.', 'duration': 22.419, 'max_score': 14271.076, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3014271076.jpg'}, {'end': 14403.387, 'src': 'embed', 'start': 14356.658, 'weight': 2, 'content': [{'end': 14360.782, 'text': 'Number one in terms of apps or just specific built applications.', 'start': 14356.658, 'duration': 4.124}, {'end': 14369.149, 'text': 'Excel and Tableau are really fundamental for both getting the data from clients or doing some basic data browsing.', 'start': 14360.782, 'duration': 8.367}, {'end': 14373.133, 'text': 'And Tableau is really wonderful for interactive data visualization.', 'start': 14369.71, 'duration': 3.423}, {'end': 14376.936, 'text': 'I strongly recommend that you get very comfortable with both of those.', 'start': 14373.233, 'duration': 3.703}, {'end': 14386.894, 'text': "In terms of code, it's a good idea to learn either our or Python or ideally to learn both because you can use them hand in hand.", 'start': 14378.007, 'duration': 8.887}, {'end': 14393.139, 'text': "In terms of utilities, it's a great idea to learn how to work with bash the command line utility.", 'start': 14388.015, 'duration': 5.124}, {'end': 14399.824, 'text': 'And to use regular expressions or regex, you can actually use those in lots and lots of programs, regular expressions.', 'start': 14393.799, 'duration': 6.025}, {'end': 14403.387, 'text': 'And so they can have a very wide application.', 'start': 14400.365, 'duration': 3.022}], 'summary': 'Excel and tableau are fundamental for data, learn r or python, bash, and regex.', 'duration': 46.729, 'max_score': 14356.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3014356658.jpg'}, {'end': 14477.767, 'src': 'embed', 'start': 14449.331, 'weight': 5, 'content': [{'end': 14456.536, 'text': 'And I can tell you really, the goal is meaning extracting meaning out of your data to make informed choices.', 'start': 14449.331, 'duration': 7.205}, {'end': 14458.137, 'text': "In fact, I'll say a little more.", 'start': 14456.556, 'duration': 1.581}, {'end': 14463.881, 'text': 'the goal is always meaning, And so with that I strongly encourage you get some tools,', 'start': 14458.137, 'duration': 5.744}, {'end': 14467.402, 'text': "get started in data science and start finding meaning in the data that's around you.", 'start': 14463.881, 'duration': 3.521}, {'end': 14471.944, 'text': 'Welcome to Mathematics and Data Science.', 'start': 14469.683, 'duration': 2.261}, {'end': 14477.767, 'text': "I'm Barton Polson, and we're going to talk about how mathematics matters for data science.", 'start': 14472.204, 'duration': 5.563}], 'summary': 'The goal is to extract meaning from data for informed choices.', 'duration': 28.436, 'max_score': 14449.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3014449331.jpg'}, {'end': 14529.652, 'src': 'embed', 'start': 14501.161, 'weight': 6, 'content': [{'end': 14507.203, 'text': "So we're going to talk about some of the basic elements of mathematics really at a conceptual level and how they apply to data science.", 'start': 14501.161, 'duration': 6.042}, {'end': 14511.762, 'text': 'There are a few ways that math really matters to data science.', 'start': 14508.561, 'duration': 3.201}, {'end': 14513.523, 'text': 'Number one.', 'start': 14512.062, 'duration': 1.461}, {'end': 14522.206, 'text': "it allows you to know which procedures to use and why, so you can answer your questions in a way that's the most informative and most useful.", 'start': 14513.523, 'duration': 8.683}, {'end': 14529.652, 'text': "Two if you have a good understanding of math, then you know what to do when things don't work right,", 'start': 14523.43, 'duration': 6.222}], 'summary': 'Understanding math in data science informs procedures and problem-solving.', 'duration': 28.491, 'max_score': 14501.161, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3014501161.jpg'}, {'end': 14572.329, 'src': 'embed', 'start': 14545.718, 'weight': 4, 'content': [{'end': 14552.301, 'text': "And so for all three of these reasons, it's helpful to have at least a grounding in mathematics if you're going to do work in data science.", 'start': 14545.718, 'duration': 6.583}, {'end': 14557.041, 'text': 'Now probably the most important thing to start with is algebra.', 'start': 14554.099, 'duration': 2.942}, {'end': 14560.402, 'text': 'And there are three kinds of algebra that we want to mention.', 'start': 14558.201, 'duration': 2.201}, {'end': 14564.965, 'text': "The first is elementary algebra, that's the regular x plus y.", 'start': 14560.443, 'duration': 4.522}, {'end': 14568.587, 'text': "Then there's linear or matrix algebra, which looks more complex,", 'start': 14564.965, 'duration': 3.622}, {'end': 14572.329, 'text': "but it's conceptually simpler and it's used by computers to actually do the calculations.", 'start': 14568.587, 'duration': 3.742}], 'summary': 'Mathematics is essential for data science, especially algebra, including elementary and linear/matrix algebra.', 'duration': 26.611, 'max_score': 14545.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3014545718.jpg'}, {'end': 15174.409, 'src': 'embed', 'start': 15146.903, 'weight': 7, 'content': [{'end': 15151.565, 'text': "And it's conceptually simpler because you can put it all there in this tight formation.", 'start': 15146.903, 'duration': 4.662}, {'end': 15157.868, 'text': "In fact, it's a very compact notation, and allows you to manipulate entire collections of numbers pretty easily.", 'start': 15151.625, 'duration': 6.243}, {'end': 15162.63, 'text': "And that's the major benefit of learning a little bit about linear or matrix algebra.", 'start': 15158.188, 'duration': 4.442}, {'end': 15171.268, 'text': 'Our next step in mathematics for data science foundations is systems of linear equations.', 'start': 15165.365, 'duration': 5.903}, {'end': 15174.409, 'text': "And maybe you're familiar with this, but maybe you're not.", 'start': 15172.148, 'duration': 2.261}], 'summary': 'Learning linear algebra simplifies manipulating collections of numbers and solving systems of linear equations.', 'duration': 27.506, 'max_score': 15146.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3015146903.jpg'}, {'end': 15476.928, 'src': 'embed', 'start': 15451.092, 'weight': 8, 'content': [{'end': 15458.736, 'text': "There's our intersection and it's at 60 on the number of cases sold at $20 and 940 on the number of cases sold at $5.", 'start': 15451.092, 'duration': 7.644}, {'end': 15463.579, 'text': 'And that also represents the solution of these joint equations.', 'start': 15458.736, 'duration': 4.843}, {'end': 15468.282, 'text': "And so it's a graphical way of solving a system of linear equations.", 'start': 15464.219, 'duration': 4.063}, {'end': 15476.928, 'text': 'So in sum, systems of linear equations allow us to balance several unknowns and find the unique solution.', 'start': 15469.505, 'duration': 7.423}], 'summary': 'Solving linear equations graphically yields unique solution for unknowns.', 'duration': 25.836, 'max_score': 15451.092, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3015451092.jpg'}, {'end': 16320.012, 'src': 'embed', 'start': 16242.406, 'weight': 0, 'content': [{'end': 16246.556, 'text': 'And that increases total revenue by 7%.', 'start': 16242.406, 'duration': 4.15}, {'end': 16250.9, 'text': 'And so we can optimize the price to get the maximum total revenue.', 'start': 16246.556, 'duration': 4.344}, {'end': 16254.683, 'text': 'And it has to do with this little bit of calculus and the derivative of a function.', 'start': 16251.26, 'duration': 3.423}, {'end': 16262.229, 'text': 'So in sum, calculus can be used to find the minima and the maxima of functions, including prices.', 'start': 16255.724, 'duration': 6.505}, {'end': 16264.571, 'text': 'It allows for optimization.', 'start': 16262.89, 'duration': 1.681}, {'end': 16267.794, 'text': 'And that in turn allows you to make better business decisions.', 'start': 16265.111, 'duration': 2.683}, {'end': 16274.659, 'text': 'Our next topic in mathematics and data principles is something called Big O.', 'start': 16270.476, 'duration': 4.183}, {'end': 16280.972, 'text': "And if you're wondering what big O is all about, well, it is about time.", 'start': 16276.209, 'duration': 4.763}, {'end': 16288.036, 'text': "Or you can think of it as how long does it take to do a particular operation? It's the speed of the operation.", 'start': 16281.632, 'duration': 6.404}, {'end': 16297.921, 'text': 'If you want to be really precise, the growth rate of a function, how much more it requires as you add elements is called its order.', 'start': 16288.656, 'duration': 9.265}, {'end': 16299.883, 'text': "That's why it's big, oh, that's for order.", 'start': 16297.961, 'duration': 1.922}, {'end': 16305.965, 'text': 'And big O gives the rate of how things grow as number of elements grows.', 'start': 16300.783, 'duration': 5.182}, {'end': 16309.107, 'text': "And what's funny is there can be really surprising differences.", 'start': 16306.466, 'duration': 2.641}, {'end': 16315.19, 'text': 'Let me show you how it works with a few different kinds of growth rates or big O.', 'start': 16309.807, 'duration': 5.383}, {'end': 16320.012, 'text': "First off, there's the ones that I say are sort of just on the spot, you can get stuff done right away.", 'start': 16315.19, 'duration': 4.822}], 'summary': 'Using calculus for price optimization increased revenue by 7%. big o measures growth rates for efficient operations.', 'duration': 77.606, 'max_score': 16242.406, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3016242406.jpg'}], 'start': 14271.076, 'title': 'The role of mathematics in data science', 'summary': 'Emphasizes the power and flexibility of regular expressions, the importance of mathematics in guiding procedures and providing a conceptual understanding, the application of algebra and linear algebra in predicting salaries, the benefits of using linear algebra in solving systems of equations and its importance in optimization, and using calculus to optimize online dating pricing, presenting a 20% cost reduction leading to a 33% sales increase and 7% total revenue increase. additionally, it explains big o notation, measuring time complexity and its significant impact on the speed and variability of operations.', 'chapters': [{'end': 14403.387, 'start': 14271.076, 'title': 'The power of regular expressions in data science', 'summary': 'Emphasizes the power and flexibility of regular expressions for finding data in data science projects, while highlighting the importance of tools like excel, tableau, r, python, and bash command line utility for data manipulation and visualization.', 'duration': 132.311, 'highlights': ["Regular expressions are very powerful and flexible for finding the right data for your project, and it's recommended to get comfortable with them (Relevance Score: 5)", "Excel and Tableau are fundamental for getting data from clients and interactive data visualization, and it's strongly recommended to get very comfortable with both tools (Relevance Score: 4)", 'Learning R or Python, and ideally both, is important as they can be used hand in hand for coding in data science (Relevance Score: 3)', 'The relationship between data tools and data science is emphasized, highlighting that data science is much bigger than just the tools (Relevance Score: 2)', "It's important to remember that data tools are an important part of data science, but data science itself is much bigger than just the tools (Relevance Score: 1)"]}, {'end': 14677.883, 'start': 14404.502, 'title': 'Importance of mathematics in data science', 'summary': 'Emphasizes the importance of mathematics in data science, highlighting its role in guiding procedures, diagnosing problems, and providing a conceptual understanding. it also stresses the significance of domain expertise and the goal of extracting meaning from data.', 'duration': 273.381, 'highlights': ['The goal is meaning, extracting meaning out of your data to make informed choices.', "Mathematics matters to data science as it allows you to know which procedures to use and why, so you can answer your questions in a way that's the most informative and most useful.", 'Domain expertise is required for data science, necessitating an understanding of specific challenges, workable answers, and available data.', 'A little bit of calculus, big O, probability theory, and Bayes theorem are covered in the course.', 'The significance of algebra, particularly elementary algebra, linear or matrix algebra, and systems of linear equations, is highlighted as foundational to data science.']}, {'end': 15086.54, 'start': 14678.523, 'title': 'Algebra in data science', 'summary': 'Discusses the application of algebra and linear algebra in data science, using real data to predict salaries and explaining the concepts of scalar, vector, and matrix in relation to linear algebra with practical examples.', 'duration': 408.017, 'highlights': ['Algebra as vital to data science', 'Application of algebra in predicting salaries', 'Explanation of scalar, vector, and matrix in linear algebra']}, {'end': 15793.437, 'start': 15086.64, 'title': 'Linear algebra and systems of equations', 'summary': 'Discusses the compact representation and benefits of using linear algebra, including the solution of systems of linear equations, demonstrated through a sales example, and the graphical representation of solutions. additionally, it highlights the importance of calculus in data science, emphasizing its role in optimization and practical applications.', 'duration': 706.797, 'highlights': ['Linear algebra provides a compact way of representing data and coefficients, making it conceptually simpler and easier for computers to solve problems.', 'Solving systems of linear equations allows the balancing of unknowns and finding unique solutions, demonstrated through a sales example where 60 cases were sold at $20 each and 940 cases at $5 each.', 'The graphical representation of systems of linear equations provides a visual way of solving and understanding the solutions, demonstrating the intersection at 60 and 940 for the sales scenarios.', 'Calculus is vital to practical data science, serving as the foundation of statistics and forming the core needed for doing optimization, as it enables finding values that maximize or minimize outcomes.']}, {'end': 16267.794, 'start': 15793.777, 'title': 'Optimizing online dating pricing', 'summary': 'Discusses using calculus to optimize the pricing for an online dating service, demonstrating how lowering the cost by 20% can increase sales by 33% and total revenue by 7%.', 'duration': 474.017, 'highlights': ['Lowering the cost by 20% can increase sales by 33% and total revenue by 7%.', 'The current revenue is $90,000 per year, but optimizing the price can increase it to $96,000, a 7% increase.', 'Deriving the function and finding its derivative allows for optimizing the price to achieve maximum total revenue.']}, {'end': 16575.747, 'start': 16270.476, 'title': 'Big o: understanding growth rates in operations', 'summary': 'Explains big o notation, which measures the time complexity of algorithms, including examples of different growth rates like constant, logarithmic, linear, log-linear, quadratic, exponential, and factorial, emphasizing the significant impact on the speed and variability of operations.', 'duration': 305.271, 'highlights': ['The chapter introduces Big O notation, explaining its significance in measuring the time complexity of algorithms and how it quantifies the growth rate of operations.', 'Examples of different growth rates and their impact on time complexity are illustrated, such as constant order (O(1)), logarithmic, linear, log-linear, quadratic, exponential, and factorial, with specific examples and comparisons of their speed and efficiency.', 'The variability of different functions and their impact on speed and efficiency is emphasized, such as the variability between insertion sort and selection sort methods, highlighting the importance of understanding the demands and efficiency of different operations.']}], 'duration': 2304.671, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3014271076.jpg', 'highlights': ['Using calculus to optimize online dating pricing, presenting a 20% cost reduction leading to a 33% sales increase and 7% total revenue increase', "Regular expressions are very powerful and flexible for finding the right data for your project, and it's recommended to get comfortable with them", "Excel and Tableau are fundamental for getting data from clients and interactive data visualization, and it's strongly recommended to get very comfortable with both tools", 'Learning R or Python, and ideally both, is important as they can be used hand in hand for coding in data science', 'The significance of algebra, particularly elementary algebra, linear or matrix algebra, and systems of linear equations, is highlighted as foundational to data science', 'The goal is meaning, extracting meaning out of your data to make informed choices', "Mathematics matters to data science as it allows you to know which procedures to use and why, so you can answer your questions in a way that's the most informative and most useful", 'Linear algebra provides a compact way of representing data and coefficients, making it conceptually simpler and easier for computers to solve problems', 'Solving systems of linear equations allows the balancing of unknowns and finding unique solutions, demonstrated through a sales example where 60 cases were sold at $20 each and 940 cases at $5 each', 'The graphical representation of systems of linear equations provides a visual way of solving and understanding the solutions, demonstrating the intersection at 60 and 940 for the sales scenarios', 'The chapter introduces Big O notation, explaining its significance in measuring the time complexity of algorithms and how it quantifies the growth rate of operations', 'Examples of different growth rates and their impact on time complexity are illustrated, such as constant order (O(1)), logarithmic, linear, log-linear, quadratic, exponential, and factorial, with specific examples and comparisons of their speed and efficiency']}, {'end': 17722.755, 'segs': [{'end': 16605.61, 'src': 'embed', 'start': 16575.747, 'weight': 1, 'content': [{'end': 16582.213, 'text': 'just run through every single possible solution, or you know, your company will be dead before you get an answer.', 'start': 16575.747, 'duration': 6.466}, {'end': 16587.699, 'text': 'So be mindful of that so you can use your time well and get the insight you need in the time that you need it.', 'start': 16582.794, 'duration': 4.905}, {'end': 16597.285, 'text': 'A really important element of the mathematics and data science and one of its foundational principles is probability.', 'start': 16590.619, 'duration': 6.666}, {'end': 16605.61, 'text': 'Now, one of the things that probability comes in intuitively for a lot of people is something like rolling dice or looking at sports outcomes.', 'start': 16597.805, 'duration': 7.805}], 'summary': 'Use probability to find solutions efficiently and make informed decisions.', 'duration': 29.863, 'max_score': 16575.747, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3016575747.jpg'}, {'end': 16685.581, 'src': 'embed', 'start': 16639.947, 'weight': 0, 'content': [{'end': 16646.713, 'text': "When you take all of the probabilities together, you get what's called the probability space, and that's why we have S.", 'start': 16639.947, 'duration': 6.766}, {'end': 16649.955, 'text': "And it all adds up to one, because you've now covered 100% of the possibilities.", 'start': 16646.713, 'duration': 3.242}, {'end': 16653.979, 'text': 'Also you can talk about the compliment.', 'start': 16651.997, 'duration': 1.982}, {'end': 16662.068, 'text': 'the tilde here is used to say probability of not a is equal to one minus the probability of a, because those have to add up.', 'start': 16653.979, 'duration': 8.089}, {'end': 16668.415, 'text': "So let's take a look at something also about conditional probabilities, which is really important in statistics.", 'start': 16663.33, 'duration': 5.085}, {'end': 16674.176, 'text': 'A conditional probability is the probability of something if something else is true.', 'start': 16669.633, 'duration': 4.543}, {'end': 16682.24, 'text': "You write it this way the probability of and that vertical line is called a pipe, and it's read as assuming that or given that.", 'start': 16674.655, 'duration': 7.585}, {'end': 16685.581, 'text': 'so you can read this as probability of a given.', 'start': 16682.24, 'duration': 3.341}], 'summary': 'Probability space forms 100% coverage. conditional probabilities are crucial in statistics.', 'duration': 45.634, 'max_score': 16639.947, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3016639947.jpg'}, {'end': 16732.236, 'src': 'embed', 'start': 16706.896, 'weight': 4, 'content': [{'end': 16712.34, 'text': "But I do want to say a few things about arithmetic with probabilities because it doesn't always work the way that people think it will.", 'start': 16706.896, 'duration': 5.444}, {'end': 16715.582, 'text': "Let's start by talking about adding probabilities.", 'start': 16712.94, 'duration': 2.642}, {'end': 16719.885, 'text': "Let's say you have two events A and B.", 'start': 16715.963, 'duration': 3.922}, {'end': 16722.608, 'text': "And let's say you want to find the probability of either one of those events.", 'start': 16719.885, 'duration': 2.723}, {'end': 16725.689, 'text': "So that's like adding the probabilities of the two events.", 'start': 16722.628, 'duration': 3.061}, {'end': 16732.236, 'text': "Well, it's kind of easy to take the probability of event A and you add the probability of event B.", 'start': 16726.471, 'duration': 5.765}], 'summary': "Arithmetic with probabilities doesn't always work as expected.", 'duration': 25.34, 'max_score': 16706.896, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3016706896.jpg'}, {'end': 17288.629, 'src': 'embed', 'start': 17264.067, 'weight': 5, 'content': [{'end': 17270.092, 'text': 'Number one, you want to use statistics to both summarize your data and to generalize from one group to another if you can.', 'start': 17264.067, 'duration': 6.025}, {'end': 17273.534, 'text': "On the other hand, there's no one true answer with data.", 'start': 17271.393, 'duration': 2.141}, {'end': 17277.457, 'text': 'You got to be flexible in terms of what your goals are and the shared knowledge.', 'start': 17273.554, 'duration': 3.903}, {'end': 17282.221, 'text': "And no matter what you're doing, the utility of your analysis should guide you in your decisions.", 'start': 17278.058, 'duration': 4.163}, {'end': 17288.629, 'text': 'The first thing we want to cover in statistics and data science is the principle of exploring data,', 'start': 17284.187, 'duration': 4.442}], 'summary': 'Use statistics to summarize and generalize data, while being flexible and guided by utility in analysis.', 'duration': 24.562, 'max_score': 17264.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017264067.jpg'}, {'end': 17352.325, 'src': 'embed', 'start': 17321.908, 'weight': 7, 'content': [{'end': 17323.51, 'text': 'Now there are two general approaches to this.', 'start': 17321.908, 'duration': 1.602}, {'end': 17326.713, 'text': "First off, there's a graphical exploration.", 'start': 17323.95, 'duration': 2.763}, {'end': 17331.136, 'text': 'So you use graphs and pictures and visualizations to explore your data.', 'start': 17327.013, 'duration': 4.123}, {'end': 17336.946, 'text': 'The reason you want to do this is that graphics are very dense in information.', 'start': 17332.119, 'duration': 4.827}, {'end': 17338.708, 'text': "They're also really good.", 'start': 17337.366, 'duration': 1.342}, {'end': 17342.032, 'text': 'In fact, the best way to get the overall impression of your data.', 'start': 17338.728, 'duration': 3.304}, {'end': 17346.258, 'text': 'Second to that, there is numerical exploration.', 'start': 17343.554, 'duration': 2.704}, {'end': 17352.325, 'text': 'I make it very clear, this is the second step, do the visualization first, then do the numerical part.', 'start': 17347.119, 'duration': 5.206}], 'summary': 'Two approaches: graphical exploration and numerical exploration are used to analyze data.', 'duration': 30.417, 'max_score': 17321.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017321908.jpg'}, {'end': 17434.228, 'src': 'embed', 'start': 17409.706, 'weight': 6, 'content': [{'end': 17415.589, 'text': 'And also, you want to explore your data thoroughly before you start modeling it, before you build statistical models.', 'start': 17409.706, 'duration': 5.883}, {'end': 17424.755, 'text': 'And all the way through, you want to make sure you listen carefully so that you can find hidden or unassumed details and leads in your data.', 'start': 17416.29, 'duration': 8.465}, {'end': 17434.228, 'text': 'As we move in our discussion of statistics and exploring data, the single most important thing we can do is exploratory graphics.', 'start': 17427.313, 'duration': 6.915}], 'summary': 'Thoroughly explore data before modeling; emphasize exploratory graphics.', 'duration': 24.522, 'max_score': 17409.706, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017409706.jpg'}, {'end': 17509.098, 'src': 'embed', 'start': 17480.905, 'weight': 8, 'content': [{'end': 17484.147, 'text': 'Now we want to do graphics first for a couple of reasons.', 'start': 17480.905, 'duration': 3.242}, {'end': 17488.41, 'text': "Number one is they're very information dense and fundamentally humans are visual.", 'start': 17484.227, 'duration': 4.183}, {'end': 17493.833, 'text': "It's our single highest bandwidth way of getting information.", 'start': 17488.95, 'duration': 4.883}, {'end': 17497.575, 'text': "It's also the best way to check for shape and gaps and outliers.", 'start': 17494.353, 'duration': 3.222}, {'end': 17500.336, 'text': 'There are a few ways you can do this if you want to.', 'start': 17498.536, 'duration': 1.8}, {'end': 17502.917, 'text': 'The first is with programs that rely on code.', 'start': 17500.356, 'duration': 2.561}, {'end': 17509.098, 'text': 'So you can use the statistical programming language R, the general purpose programming language Python.', 'start': 17503.537, 'duration': 5.561}], 'summary': 'Graphics are information-dense and essential for shape and outlier detection using r or python.', 'duration': 28.193, 'max_score': 17480.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017480905.jpg'}], 'start': 16575.747, 'title': 'Probability and statistics in decision making', 'summary': 'Covers the importance of probability in decision-making and time management, conditional probabilities, arithmetic with probabilities, and the significance of statistics in data science. it emphasizes the principles of probability, the concept of probability space, and exploratory data analysis techniques, illustrating the value of graphics in understanding and validating data.', 'chapters': [{'end': 16662.068, 'start': 16575.747, 'title': 'Probability basics and time management', 'summary': 'Discusses the importance of probability in decision-making and time management, emphasizing the principles of probability and the concept of probability space, which ranges from 0 to 1 and covers 100% of the possibilities.', 'duration': 86.321, 'highlights': ['The concept of probability space, ranging from 0 to 1, covers 100% of the possibilities, ensuring a comprehensive understanding of potential outcomes.', 'Probability principles, including the calculation of probabilities using P(A) and P(B), are crucial in decision-making and analyzing various scenarios.', "Emphasizing the importance of time management, as failure to efficiently utilize time can lead to detrimental consequences for the company's survival."]}, {'end': 17174.764, 'start': 16663.33, 'title': 'Conditional probabilities in statistics', 'summary': 'Discusses the concept of conditional probabilities, arithmetic with probabilities, and the importance of statistics in data science, emphasizing the role of statistics in summarizing and generalizing data.', 'duration': 511.434, 'highlights': ['The chapter discusses the concept of conditional probabilities, emphasizing its importance in statistics.', 'The chapter explains arithmetic with probabilities, including adding and multiplying probabilities, and the adjustments required for overlapping events.', 'The importance of statistics in data science is discussed, focusing on the challenges of summarizing and generalizing data.']}, {'end': 17722.755, 'start': 17175.205, 'title': 'Exploring data in statistics and data science', 'summary': 'Emphasizes the importance of exploratory data analysis, covering graphical and numerical methods to explore data, and discusses the value of graphics in understanding and validating data, as well as the significance of statistical exploratory graphics in data analysis.', 'duration': 547.55, 'highlights': ['The chapter emphasizes the importance of exploratory data analysis', 'Covering graphical and numerical methods to explore data', 'The value of graphics in understanding and validating data', 'The significance of statistical exploratory graphics in data analysis']}], 'duration': 1147.008, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3016575747.jpg', 'highlights': ['The concept of probability space, ranging from 0 to 1, covers 100% of the possibilities, ensuring a comprehensive understanding of potential outcomes.', 'Probability principles are crucial in decision-making and analyzing various scenarios.', "Emphasizing the importance of time management for company's survival.", 'The chapter discusses the concept of conditional probabilities, emphasizing its importance in statistics.', 'The chapter explains arithmetic with probabilities, including adding and multiplying probabilities, and the adjustments required for overlapping events.', 'The importance of statistics in data science is discussed, focusing on the challenges of summarizing and generalizing data.', 'The chapter emphasizes the importance of exploratory data analysis.', 'Covering graphical and numerical methods to explore data.', 'The value of graphics in understanding and validating data.', 'The significance of statistical exploratory graphics in data analysis.']}, {'end': 18656.922, 'segs': [{'end': 17750.347, 'src': 'embed', 'start': 17723.856, 'weight': 0, 'content': [{'end': 17729.119, 'text': 'And then finally, you want to go to many variables, that is multivariate distributions.', 'start': 17723.856, 'duration': 5.263}, {'end': 17733.061, 'text': 'Now, one big question here is 3D or not 3D.', 'start': 17729.979, 'duration': 3.082}, {'end': 17738.567, 'text': 'Let me actually make an argument for not 3D.', 'start': 17736.23, 'duration': 2.337}, {'end': 17746.204, 'text': 'So what I have here is a 3D scatterplot of three variables about Google searches.', 'start': 17740.959, 'duration': 5.245}, {'end': 17750.347, 'text': 'Up the left, I have FIFA, which is for professional soccer.', 'start': 17747.004, 'duration': 3.343}], 'summary': 'Exploring multivariate distributions, debating 3d visualization for google searches data.', 'duration': 26.491, 'max_score': 17723.856, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017723856.jpg'}, {'end': 17827.798, 'src': 'embed', 'start': 17788.795, 'weight': 5, 'content': [{'end': 17793.578, 'text': "And truthfully, most 3d plots I've worked with are just kind of nightmares.", 'start': 17788.795, 'duration': 4.783}, {'end': 17797.24, 'text': "They seem like they're a good idea, but not really.", 'start': 17794.438, 'duration': 2.802}, {'end': 17800.262, 'text': "So, here's the deal.", 'start': 17799.041, 'duration': 1.221}, {'end': 17804.984, 'text': "3D graphics like the one I just showed you, because they're actually being shown in 2D.", 'start': 17801.082, 'duration': 3.902}, {'end': 17807.366, 'text': "they have to be in motion for you to tell what's going on at all.", 'start': 17804.984, 'duration': 2.382}, {'end': 17810.127, 'text': "And fundamentally, they're hard to read and confusing.", 'start': 17807.986, 'duration': 2.141}, {'end': 17814.53, 'text': "Now, it's true they might be useful for finding clusters in three dimensions.", 'start': 17810.568, 'duration': 3.962}, {'end': 17815.931, 'text': "We didn't see that in the data we had.", 'start': 17814.57, 'duration': 1.361}, {'end': 17818.632, 'text': 'But generally, I just avoid them like the plague.', 'start': 17816.491, 'duration': 2.141}, {'end': 17822.675, 'text': 'What you want to do, however, is see the connection between several variables.', 'start': 17819.433, 'duration': 3.242}, {'end': 17824.736, 'text': 'You might want to use a matrix of plots.', 'start': 17822.795, 'duration': 1.941}, {'end': 17827.798, 'text': 'This is where you have, for instance, many quantitative variables.', 'start': 17825.377, 'duration': 2.421}], 'summary': '3d plots are hard to read and confusing, better to use matrix of plots for multiple variables.', 'duration': 39.003, 'max_score': 17788.795, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017788795.jpg'}, {'end': 17915.508, 'src': 'embed', 'start': 17883.215, 'weight': 2, 'content': [{'end': 17887.299, 'text': 'And so in sum, I can say this about graphical exploration of data.', 'start': 17883.215, 'duration': 4.084}, {'end': 17889.019, 'text': "It's a critical first step.", 'start': 17887.879, 'duration': 1.14}, {'end': 17891.14, 'text': 'This is basically where you always want to start.', 'start': 17889.159, 'duration': 1.981}, {'end': 17894.081, 'text': 'And you want to use the quick and easy methods.', 'start': 17891.64, 'duration': 2.441}, {'end': 17899.963, 'text': "Again, bar charts, scatter plots are really easy to make and they're very easy to understand.", 'start': 17894.241, 'duration': 5.722}, {'end': 17907.185, 'text': "And once you're done with the graphical exploration, then you can go to the second step, which is exploring the data through numbers.", 'start': 17900.783, 'duration': 6.402}, {'end': 17915.508, 'text': 'The next step in statistics and exploring data is exploratory statistics or numerical exploration of data.', 'start': 17908.926, 'duration': 6.582}], 'summary': 'Graphical exploration is a critical first step, followed by numerical exploration.', 'duration': 32.293, 'max_score': 17883.215, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017883215.jpg'}, {'end': 18001.585, 'src': 'embed', 'start': 17972.627, 'weight': 7, 'content': [{'end': 17976.251, 'text': 'And the idea with a robust statistics is that they are stable.', 'start': 17972.627, 'duration': 3.624}, {'end': 17982.057, 'text': 'is that even when the data varies in sort of unpredictable ways, you still get the same general impression.', 'start': 17976.251, 'duration': 5.806}, {'end': 17984.978, 'text': 'This is a class of statistics.', 'start': 17982.917, 'duration': 2.061}, {'end': 17992.201, 'text': "it's an entire category that's less affected by outliers and by skewness and kurtosis and other abnormalities in the data.", 'start': 17984.978, 'duration': 7.223}, {'end': 17994.042, 'text': "So let's take a quick look.", 'start': 17992.881, 'duration': 1.161}, {'end': 17996.603, 'text': 'This is a very skewed distribution I created.', 'start': 17994.302, 'duration': 2.301}, {'end': 18001.585, 'text': 'The median, which is the dark line there in the box, is right around 1.', 'start': 17997.283, 'duration': 4.302}], 'summary': 'Robust statistics provide stable results in the presence of data variations and abnormalities, with the median of a skewed distribution around 1.', 'duration': 28.958, 'max_score': 17972.627, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017972627.jpg'}, {'end': 18083.161, 'src': 'embed', 'start': 18059.942, 'weight': 1, 'content': [{'end': 18068.711, 'text': "And so, that's an interesting example of how you can use robust statistics to explore data even when you have things like strong skewness.", 'start': 18059.942, 'duration': 8.769}, {'end': 18072.379, 'text': 'Next is the principle of resampling.', 'start': 18070.638, 'duration': 1.741}, {'end': 18079.12, 'text': "And that's like pulling marbles repeatedly out of a jar, counting the colors, putting them back in and trying again.", 'start': 18072.939, 'duration': 6.181}, {'end': 18083.161, 'text': "That's an empirical estimate of sampling variability.", 'start': 18079.961, 'duration': 3.2}], 'summary': 'Robust statistics can handle strong skewness. resampling provides empirical estimate of sampling variability.', 'duration': 23.219, 'max_score': 18059.942, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018059942.jpg'}, {'end': 18134.284, 'src': 'embed', 'start': 18106.749, 'weight': 8, 'content': [{'end': 18110.21, 'text': "Here's our caterpillars in the process of transforming into butterflies.", 'start': 18106.749, 'duration': 3.461}, {'end': 18116.797, 'text': "But the idea here is you take a sort of difficult data set and then you do what's called a smooth function.", 'start': 18111.235, 'duration': 5.562}, {'end': 18122.379, 'text': "There's no jumps in it and something that preserves the order and allows you to work on the full data set.", 'start': 18117.077, 'duration': 5.302}, {'end': 18127.681, 'text': 'So you can fix skewed data and in a scatter plot, you might have a curved line.', 'start': 18122.98, 'duration': 4.701}, {'end': 18129.082, 'text': 'You can fix that.', 'start': 18127.721, 'duration': 1.361}, {'end': 18134.284, 'text': "And probably the best way to look at this is with something called Tukey's ladder of powers.", 'start': 18129.882, 'duration': 4.402}], 'summary': "Transforming caterpillars into butterflies using smooth functions to fix skewed data in scatter plots with tukey's ladder of powers.", 'duration': 27.535, 'max_score': 18106.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018106749.jpg'}, {'end': 18217.647, 'src': 'embed', 'start': 18193.841, 'weight': 4, 'content': [{'end': 18201.318, 'text': "In sum, let's say this, Statistical or numerical exploration allows you to get multiple perspectives on your data.", 'start': 18193.841, 'duration': 7.477}, {'end': 18207.347, 'text': 'It also allows you to check the stability, see how it works with outliers and skewness and mixed distributions and so on.', 'start': 18201.879, 'duration': 5.468}, {'end': 18212.775, 'text': 'And perhaps most importantly, it sets the stage for the statistical modeling of your data.', 'start': 18207.968, 'duration': 4.807}, {'end': 18217.647, 'text': 'As a final step of statistics and exploring data.', 'start': 18214.665, 'duration': 2.982}], 'summary': 'Statistical exploration provides multiple perspectives on data, checks stability, outliers, skewness, and sets stage for statistical modeling.', 'duration': 23.806, 'max_score': 18193.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018193841.jpg'}, {'end': 18291.304, 'src': 'embed', 'start': 18257.643, 'weight': 9, 'content': [{'end': 18260.145, 'text': "so there's a few different procedures for doing this number.", 'start': 18257.643, 'duration': 2.502}, {'end': 18264.227, 'text': 'one you want to describe the center of your distribution of data.', 'start': 18260.145, 'duration': 4.082}, {'end': 18267.289, 'text': "that's, if you're going to pick a single number, use that.", 'start': 18264.227, 'duration': 3.062}, {'end': 18273.452, 'text': 'two, if you can give a second number, give something about the spread or the dispersion of the variability.', 'start': 18267.289, 'duration': 6.163}, {'end': 18277.515, 'text': "and three, it's also nice to be able to describe the shape of the distribution.", 'start': 18273.452, 'duration': 4.063}, {'end': 18280.016, 'text': 'let me say more about each of these in turn.', 'start': 18277.515, 'duration': 2.501}, {'end': 18281.457, 'text': "first, let's talk about center.", 'start': 18280.016, 'duration': 1.441}, {'end': 18283.458, 'text': 'we have the center of our rings here.', 'start': 18281.457, 'duration': 2.001}, {'end': 18291.304, 'text': 'Now there are a few very common measures of center or location or central tendency of a distribution.', 'start': 18284.679, 'duration': 6.625}], 'summary': 'Describing data distribution involves center, spread, and shape; using common measures of central tendency.', 'duration': 33.661, 'max_score': 18257.643, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018257643.jpg'}], 'start': 17723.856, 'title': 'Multivariate distributions analysis', 'summary': 'Explores the analysis of multivariate distributions using 2d scatterplots for google search data on fifa, nfl, and nba, and emphasizes the limitations of 3d graphics for data visualization, advocating for 2d matrix plots and discussing robust statistics, resampling, and descriptive statistics for setting the stage for statistical modeling.', 'chapters': [{'end': 17766.622, 'start': 17723.856, 'title': 'Multivariate distributions analysis', 'summary': 'Explores the analysis of multivariate distributions and argues for the use of 2d scatterplots instead of 3d ones, demonstrating the concept using google search data for fifa, nfl, and nba.', 'duration': 42.766, 'highlights': ['Using 2D scatterplots for multivariate distributions analysis is more effective than 3D scatterplots, as demonstrated through Google search data for FIFA, NFL, and NBA.', 'The use of 2D scatterplots provides interactive features such as click and drag, enhancing the visualization experience.']}, {'end': 18059.382, 'start': 17767.402, 'title': 'Exploration of 3d graphics', 'summary': 'Discusses the limitations of 3d graphics for data visualization and emphasizes the preference for 2d matrix plots, highlighting the benefits of robust statistics for stable data analysis and comparison, reinforcing the importance of graphical exploration for data analysis.', 'duration': 291.98, 'highlights': ['The limitations of 3D graphics for data visualization', 'Preference for 2D matrix plots over 3D graphics', 'Importance of graphical exploration for data analysis', 'Benefits of robust statistics for stable data analysis']}, {'end': 18656.922, 'start': 18059.942, 'title': 'Exploratory data analysis', 'summary': "Discusses robust statistics, resampling, transforming variables, and descriptive statistics, including measures of center, spread, and shape of the distribution, to gain multiple perspectives on data and to set the stage for statistical modeling, with examples such as tukey's ladder of powers and the pros and cons of mode, median, and mean.", 'duration': 596.98, 'highlights': ['The chapter discusses robust statistics, resampling, transforming variables, and descriptive statistics, including measures of center, spread, and shape of the distribution, to gain multiple perspectives on data and to set the stage for statistical modeling.', 'Exploring data using numerical methods allows checking the stability, dealing with outliers and skewness, and examining mixed distributions.', 'The principle of resampling, including the jackknife, bootstrap, permutation, and cross-validation, provides empirical estimates of sampling variability.', "Transforming variables involves using smooth functions, such as Tukey's ladder of powers, to fix skewed data and curved lines in scatter plots.", 'Descriptive statistics aims to use a few numbers to stand in for a large collection of numbers, focusing on measures of center, spread, and shape of the distribution.']}], 'duration': 933.066, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3017723856.jpg', 'highlights': ['Using 2D scatterplots for multivariate distributions analysis is more effective than 3D scatterplots, as demonstrated through Google search data for FIFA, NFL, and NBA.', 'The chapter discusses robust statistics, resampling, transforming variables, and descriptive statistics, including measures of center, spread, and shape of the distribution, to set the stage for statistical modeling.', 'The use of 2D scatterplots provides interactive features such as click and drag, enhancing the visualization experience.', 'The principle of resampling provides empirical estimates of sampling variability, including the jackknife, bootstrap, permutation, and cross-validation.', 'Exploring data using numerical methods allows checking the stability, dealing with outliers and skewness, and examining mixed distributions.', 'Preference for 2D matrix plots over 3D graphics for data visualization.', 'Importance of graphical exploration for data analysis.', 'Benefits of robust statistics for stable data analysis.', "Transforming variables involves using smooth functions, such as Tukey's ladder of powers, to fix skewed data and curved lines in scatter plots.", 'Descriptive statistics aims to use a few numbers to stand in for a large collection of numbers, focusing on measures of center, spread, and shape of the distribution.', 'The limitations of 3D graphics for data visualization.']}, {'end': 21126.572, 'segs': [{'end': 18698.456, 'src': 'embed', 'start': 18674.914, 'weight': 4, 'content': [{'end': 18682.039, 'text': 'The formulas for the variance and the standard deviation are slightly different for populations and samples, in that they use different denominators.', 'start': 18674.914, 'duration': 7.125}, {'end': 18686.405, 'text': 'But they give similar answers, not identical, but similar.', 'start': 18682.902, 'duration': 3.503}, {'end': 18692.19, 'text': "if the sample is reasonably large, say over 30 or 50, then it's going to be really just a negligible difference.", 'start': 18686.405, 'duration': 5.785}, {'end': 18695.433, 'text': "So let's do a little pro and con of these three things.", 'start': 18693.491, 'duration': 1.942}, {'end': 18696.874, 'text': 'First, the range.', 'start': 18696.054, 'duration': 0.82}, {'end': 18698.456, 'text': "It's very easy to do.", 'start': 18696.894, 'duration': 1.562}], 'summary': 'Variance and standard deviation formulas differ for populations and samples, but yield similar results. sample size over 30 or 50 results in negligible difference.', 'duration': 23.542, 'max_score': 18674.914, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018674914.jpg'}, {'end': 18861.782, 'src': 'embed', 'start': 18831.465, 'weight': 2, 'content': [{'end': 18838.529, 'text': 'The next step in our discussion of statistics and inference is hypothesis testing, a very common procedure in some fields of research.', 'start': 18831.465, 'duration': 7.064}, {'end': 18842.752, 'text': 'I like to think of it as put your money where your mouth is and test your theory.', 'start': 18839.43, 'duration': 3.322}, {'end': 18845.333, 'text': "Here's the Wright brothers out testing their plane.", 'start': 18843.292, 'duration': 2.041}, {'end': 18848.895, 'text': 'Now the basic idea behind hypothesis testing is this.', 'start': 18846.314, 'duration': 2.581}, {'end': 18861.782, 'text': "You start with a question and it's something like what is the probability of X occurring by chance? if randomness or meaningless sampling variation is the only explanation?", 'start': 18849.496, 'duration': 12.286}], 'summary': 'Introduction to hypothesis testing in statistics, a common procedure in research.', 'duration': 30.317, 'max_score': 18831.465, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018831465.jpg'}, {'end': 19316.371, 'src': 'embed', 'start': 19288.777, 'weight': 1, 'content': [{'end': 19291.919, 'text': 'And what it does is it gives you a range of numbers, a high and a low.', 'start': 19288.777, 'duration': 3.142}, {'end': 19296.141, 'text': 'And the higher your level of confidence, the more confident you want to be,', 'start': 19292.419, 'duration': 3.722}, {'end': 19300.463, 'text': 'the wider the range is going to be between your high and your low estimates.', 'start': 19296.141, 'duration': 4.322}, {'end': 19308.204, 'text': "Now there's a fundamental trade-off in what's happening here, and it's the trade-off between accuracy, which means you're on target, or,", 'start': 19302.199, 'duration': 6.005}, {'end': 19313.248, 'text': 'more specifically, that your interval contains the true population value.', 'start': 19308.204, 'duration': 5.044}, {'end': 19316.371, 'text': 'And the idea is that leads you to the correct inference.', 'start': 19313.768, 'duration': 2.603}], 'summary': 'Confidence level affects range of estimates for accuracy in making inferences.', 'duration': 27.594, 'max_score': 19288.777, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3019288777.jpg'}, {'end': 19983.687, 'src': 'embed', 'start': 19955.105, 'weight': 3, 'content': [{'end': 19960.769, 'text': 'And then maximum likelihood or ml is a special case of maximum O posteriori.', 'start': 19955.105, 'duration': 5.664}, {'end': 19969.916, 'text': 'And just in case you like it, we can put it in set notation, OLS is a subset of ML, is a subset of MAP.', 'start': 19961.629, 'duration': 8.287}, {'end': 19976.441, 'text': 'And so there are connections between these three methods of estimating population parameters.', 'start': 19970.616, 'duration': 5.825}, {'end': 19979.243, 'text': 'Let me just sum it up briefly this way.', 'start': 19977.201, 'duration': 2.042}, {'end': 19983.687, 'text': 'The standards that you use OLS, ML, MAP.', 'start': 19979.843, 'duration': 3.844}], 'summary': 'Ols, ml, and map are connected methods for estimating population parameters.', 'duration': 28.582, 'max_score': 19955.105, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3019955105.jpg'}, {'end': 20250.531, 'src': 'embed', 'start': 20225.118, 'weight': 0, 'content': [{'end': 20230.94, 'text': 'One of the most important is going to be feature selection, or the choice of variables to include in your model.', 'start': 20225.118, 'duration': 5.822}, {'end': 20241.004, 'text': "It's sort of like confronting this enormous range of information and trying to choose what matters most, trying to get the needle out of the haystack.", 'start': 20232.001, 'duration': 9.003}, {'end': 20250.531, 'text': 'The goal of feature selection is to select the best features or variables and get rid of uninformative and noisy variables,', 'start': 20241.984, 'duration': 8.547}], 'summary': 'Feature selection is crucial for choosing the best variables in a model, eliminating uninformative and noisy ones.', 'duration': 25.413, 'max_score': 20225.118, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3020225118.jpg'}], 'start': 18657.702, 'title': 'Statistics and data analysis', 'summary': 'Covers descriptive statistics, hypothesis testing, estimation, parameter estimation, model fit assessment, and model validation, with discussions on variance, standard deviation, sample size impact, distribution shape, null and alternative hypotheses, confidence intervals, ols, ml, map estimation methods, and model validation techniques such as feature selection, multicollinearity, and generalizability assessments.', 'chapters': [{'end': 18829.394, 'start': 18657.702, 'title': 'Descriptive statistics and data shape', 'summary': 'Covers the calculation of variance and standard deviation, the impact of sample size on their differences, and the importance of understanding the shape of a distribution in descriptive statistics.', 'duration': 171.692, 'highlights': ['The importance of variance and standard deviation in data science procedures', 'Impact of sample size on the difference between sample and population variances', 'Different shapes of distributions and their impact on data interpretation']}, {'end': 19161.803, 'start': 18831.465, 'title': 'Hypothesis testing in statistics', 'summary': 'Discusses the concept of hypothesis testing, its applications in scientific research, medical diagnostics, and decision-making, the process of hypothesis testing, the concept of null and alternative hypotheses, the potential for false positives and false negatives, and the limitations and problems associated with hypothesis testing.', 'duration': 330.338, 'highlights': ['The chapter discusses the applications of hypothesis testing in scientific research, social sciences, medical diagnostics, and decision-making, highlighting its common usage in these fields.', 'The chapter explains the process of hypothesis testing, including the concept of null and alternative hypotheses, and the use of a bell curve to mark regions of rejection and calculate scores for data.', 'The chapter discusses the potential for false positives and false negatives in hypothesis testing, highlighting the risks and implications of each type of error.', 'The chapter highlights the limitations and problems associated with hypothesis testing, including misinterpretation, confounding factors, bias from the use of a cutoff, and answering the wrong question.']}, {'end': 19678.204, 'start': 19161.963, 'title': 'Estimation and confidence intervals in statistics', 'summary': 'Discusses hypothesis testing, estimation, and confidence intervals in statistics, highlighting the trade-offs between accuracy and precision, the factors affecting the width of confidence intervals, and the importance of confidence intervals in statistical analysis.', 'duration': 516.241, 'highlights': ['The trade-offs between accuracy and precision in estimation are emphasized, illustrating scenarios of accurate but imprecise estimates and precise but inaccurate estimates.', 'The factors affecting the width of confidence intervals are discussed, including higher confidence levels creating wider intervals, larger standard deviations leading to wider intervals, and larger sample sizes resulting in narrower intervals.', 'The importance of including confidence intervals in statistical analysis is highlighted, as they provide information about the population parameter and the variability of the data, making them more informative.', 'The process of estimation is explained, focusing on confidence intervals as the most common version of estimation and their explicit inclusion of variation in the data.', 'The demonstration of confidence intervals including the population mean in a high percentage of randomly selected samples reinforces their reliability and importance in statistical analysis.']}, {'end': 20181.038, 'start': 19680.635, 'title': 'Methods of parameter estimation', 'summary': 'Delves into three common methods of parameter estimation, namely ordinary least squares (ols), maximum likelihood (ml), and maximum a posteriori (map), while also discussing measures of fit such as r-squared, minus 2ll, aic, bic, and chi-squared.', 'duration': 500.403, 'highlights': ['The chapter discusses three common methods of parameter estimation: Ordinary Least Squares (OLS), Maximum Likelihood (ML), and Maximum A Posteriori (MAP).', 'The chapter explains the measures of fit, including R-squared, minus 2LL, AIC, BIC, and chi-squared.', 'The concept of Best Linear Unbiased Estimator (BLUE) is introduced in the context of Ordinary Least Squares (OLS).']}, {'end': 21126.572, 'start': 20181.038, 'title': 'Model fit assessment and model validation', 'summary': 'Discusses the assessment of model fit and the importance of model validation, emphasizing the impact of feature selection, the problem of multicollinearity, and methods for assessing model generalizability, including bayesian approach, replication, holdout validation, and cross-validation.', 'duration': 945.534, 'highlights': ['The importance of feature selection in simplifying the statistical model and avoiding overfitting, with the goal of selecting the best variables and eliminating uninformative and noisy variables.', 'The problem of multicollinearity and its impact on the association between predictors and the model, along with various methods for dealing with multicollinearity, such as commonality analysis, dominance analysis, and relative importance weights.', "Methods for assessing model generalizability, including Bayesian approach, replication, holdout validation, and cross-validation, to ensure the model's applicability to other data sets and situations."]}], 'duration': 2468.87, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ua-CiDNNj30/pics/ua-CiDNNj3018657702.jpg', 'highlights': ['The importance of feature selection in simplifying the statistical model and avoiding overfitting, with the goal of selecting the best variables and eliminating uninformative and noisy variables.', 'The trade-offs between accuracy and precision in estimation are emphasized, illustrating scenarios of accurate but imprecise estimates and precise but inaccurate estimates.', 'The chapter discusses the applications of hypothesis testing in scientific research, social sciences, medical diagnostics, and decision-making, highlighting its common usage in these fields.', 'The chapter discusses three common methods of parameter estimation: Ordinary Least Squares (OLS), Maximum Likelihood (ML), and Maximum A Posteriori (MAP).', 'The importance of variance and standard deviation in data science procedures']}], 'highlights': ['Data science projected to have high demand for 140-190,000 data scientists and 1.5 million data-savvy managers', 'Data scientists rank among the top 10 highest-paying professions with an average total compensation of $144,000 a year', 'Data science emphasizes creativity and inclusive analysis using tools from coding, statistics, and math', 'Data science skills, particularly statistical analysis and data mining, are among the top job skills in various countries', "Data science emphasizes the importance of listening to all data, even when it doesn't fit standard approaches", 'Data science aims to extract insight from data, promoting multiple problem-solving approaches and inclusive analysis', 'Data prep, statistical modeling, and follow-up are key steps in the data science pathway', 'The diverse roles in data science include engineers, big data specialists, researchers, analysts, and business people', 'Collective skills and expertise are essential for data science and big data', 'The importance of programming languages and tools like R, Python, SQL, Bash, and regular expressions', 'The role of mathematics in data science, including algebra, calculus, big O, probability theory, and Bayes theorem', 'The significance of statistics in exploring and inferring data', 'The necessity of interpretability and communication in presenting data-driven stories and analyses', 'Tips for effective presentations: more charts, less text, and simplified charts', "Importance of aligning data science recommendations with the client's mission", "Value of storytelling in data analysis and addressing client's goals clearly", 'Reproducible research is essential in data science projects for accountability and adaptability', 'Clear and focused presentation graphics are emphasized for providing reliable information to clients', 'The importance of explicit goals and specific metrics to measure success', 'The significance of accuracy in measurements and a quantitative method involving a classification table', 'Maximizing true positives and true negatives in data analysis', 'Understanding the social context of measurement', 'The importance of spreadsheets, Tableau, R, Python, and SQL programming languages in data science', 'The significance of JASP, SPSS, Open Science Framework, and various data mining tools in data science', 'R is the top language for data science, with 50% more usage than Python', "The significance of Python's community and over 70,000 packages", 'SQL, C, C++, and Java are essential in data science for working with relational databases and production-level code', 'The significance of algebra, linear or matrix algebra, and systems of linear equations in data science', 'The importance of probability principles in decision-making and analyzing various scenarios', 'The concept of probability space, ranging from 0 to 1, ensuring a comprehensive understanding of potential outcomes', "The importance of time management for company's survival", 'The significance of statistics in data science, focusing on the challenges of summarizing and generalizing data', 'The value of graphics in understanding and validating data', 'Using 2D scatterplots for multivariate distributions analysis and robust statistics for stable data analysis', 'The importance of feature selection in simplifying the statistical model and avoiding overfitting', 'The trade-offs between accuracy and precision in estimation', 'The applications of hypothesis testing in scientific research, social sciences, medical diagnostics, and decision-making', 'The importance of variance and standard deviation in data science procedures']}