title
StatsLearning Lecture 1 - part1

description

detail
{'title': 'StatsLearning Lecture 1 - part1', 'heatmap': [{'end': 250.543, 'start': 208.06, 'weight': 0.917}, {'end': 319.374, 'start': 295.817, 'weight': 0.823}, {'end': 412.587, 'start': 368.355, 'weight': 1}, {'end': 496.399, 'start': 467.528, 'weight': 0.731}], 'summary': 'Covers the introduction to statistical learning and evolution of machine learning by trevor hastie and rob tibshirani, statistical learning using a small dataset to predict psa, and various machine learning applications, such as email spam detection, gene expression profiling, and land use prediction.', 'chapters': [{'end': 186.494, 'segs': [{'end': 29.925, 'src': 'embed', 'start': 0.336, 'weight': 0, 'content': [{'end': 1.277, 'text': "Hi, I'm Trevor Hastie.", 'start': 0.336, 'duration': 0.941}, {'end': 2.418, 'text': "And I'm Rob Tibshirani.", 'start': 1.357, 'duration': 1.061}, {'end': 6.502, 'text': 'And then you say, welcome to the course on statistical learning.', 'start': 3.819, 'duration': 2.683}, {'end': 7.502, 'text': "Hi, I'm Trevor Hastie.", 'start': 6.562, 'duration': 0.94}, {'end': 9.925, 'text': "And I'm Rob Tibshirani.", 'start': 7.583, 'duration': 2.342}, {'end': 10.725, 'text': "Hi, I'm Trevor.", 'start': 10.065, 'duration': 0.66}, {'end': 12.127, 'text': "No, I'm Rob Tibshirani.", 'start': 10.866, 'duration': 1.261}, {'end': 13.128, 'text': "And I'm Trevor Hastie.", 'start': 12.167, 'duration': 0.961}, {'end': 16.01, 'text': 'And welcome to our course on statistical learning.', 'start': 13.848, 'duration': 2.162}, {'end': 20.234, 'text': "This is the first online course we've ever given, and we're really excited to tell you about it.", 'start': 16.471, 'duration': 3.763}, {'end': 21.795, 'text': 'And a little nervous, as you can hear.', 'start': 20.354, 'duration': 1.441}, {'end': 27.401, 'text': 'So by way of background, what is statistical learning? Trevor and I are both statisticians.', 'start': 22.496, 'duration': 4.905}, {'end': 29.925, 'text': 'We were actually graduate students here at Stanford in the 80s.', 'start': 27.421, 'duration': 2.504}], 'summary': "Trevor hastie and rob tibshirani welcome to their course on statistical learning at stanford, the first online course they've given.", 'duration': 29.589, 'max_score': 0.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg336.jpg'}, {'end': 70.526, 'src': 'embed', 'start': 40.669, 'weight': 2, 'content': [{'end': 44.611, 'text': 'But in the 1980s, people in computer science developed the field of machine learning.', 'start': 40.669, 'duration': 3.942}, {'end': 47.433, 'text': 'Especially neural networks became a very hot topic.', 'start': 45.492, 'duration': 1.941}, {'end': 49.754, 'text': 'I was at University of Toronto and Trevor was at Bell Labs.', 'start': 47.613, 'duration': 2.141}, {'end': 56.678, 'text': 'And one of the first neural networks was developed at Bell Labs to solve the zip code recognition problem,', 'start': 50.514, 'duration': 6.164}, {'end': 59.339, 'text': "which we'll show you a little bit about in a few slides.", 'start': 56.678, 'duration': 2.661}, {'end': 64.562, 'text': 'So around that time, Trevor and I, and then some colleagues, Jerry Friedman, Brad Efron.', 'start': 59.379, 'duration': 5.183}, {'end': 66.203, 'text': 'Leo Bryman.', 'start': 64.702, 'duration': 1.501}, {'end': 70.526, 'text': "And actually, you'll hear from Jerry and Brad both in this course.", 'start': 66.363, 'duration': 4.163}], 'summary': 'In the 1980s, machine learning, especially neural networks, gained popularity. bell labs developed one of the first neural networks to solve the zip code recognition problem.', 'duration': 29.857, 'max_score': 40.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg40669.jpg'}, {'end': 167.803, 'src': 'embed', 'start': 125.973, 'weight': 1, 'content': [{'end': 132.436, 'text': "And here's a quote that came in the New York Times in 2009 from Hal Varian, who's the chief economist at Google.", 'start': 125.973, 'duration': 6.463}, {'end': 133.757, 'text': 'You can see the quote there.', 'start': 132.776, 'duration': 0.981}, {'end': 136.898, 'text': 'Keep saying that the sexy job in the next 10 years will be statisticians.', 'start': 133.797, 'duration': 3.101}, {'end': 141.541, 'text': "And indeed, there's a picture of Carrie Grimes, who was a graduate from Stanford Statistics.", 'start': 137.339, 'duration': 4.202}, {'end': 144.622, 'text': 'She was one of the first statisticians hired at Google.', 'start': 141.821, 'duration': 2.801}, {'end': 146.563, 'text': 'Now Google has many statisticians.', 'start': 144.862, 'duration': 1.701}, {'end': 150.733, 'text': 'Our next example, this is a picture of Nate Silver on the right.', 'start': 148.191, 'duration': 2.542}, {'end': 154.115, 'text': "Nate has a master's in economics, but he calls himself a statistician.", 'start': 150.813, 'duration': 3.302}, {'end': 158.697, 'text': 'And he writes, at least he did write a blog called FiveThirtyEight for the New York Times.', 'start': 154.915, 'duration': 3.782}, {'end': 167.803, 'text': 'And in that blog, he predicted the outcome of the 2012 presidential and Senate elections very well.', 'start': 159.398, 'duration': 8.405}], 'summary': "In 2009, google's chief economist predicted statisticians as the prominent job in the next 10 years. subsequently, google hired statisticians, and nate silver accurately predicted the 2012 election outcomes.", 'duration': 41.83, 'max_score': 125.973, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg125973.jpg'}], 'start': 0.336, 'title': 'Introduction to statistical learning and evolution of machine learning', 'summary': 'Introduces the online course on statistical learning by trevor hastie and rob tibshirani, emphasizing their background in applied statistics at stanford. it also discusses the evolution of machine learning, from neural networks to its real-world applications and notable successes in data analysis and accurate election predictions.', 'chapters': [{'end': 38.316, 'start': 0.336, 'title': 'Introduction to statistical learning', 'summary': 'Introduces the online course on statistical learning by trevor hastie and rob tibshirani, both statisticians and graduate students at stanford, emphasizing their excitement about the first online course and their background in applied statistics at stanford in the 80s.', 'duration': 37.98, 'highlights': ['Trevor Hastie and Rob Tibshirani introduce the first online course on statistical learning, expressing their excitement and nervousness about it.', 'Trevor and Rob, both statisticians and former graduate students at Stanford in the 80s, have known each other for about 30 years and have a background in applied statistics.', 'Statistics have been around since about 1900 or before, and Trevor and Rob have been involved in applied statistics.']}, {'end': 186.494, 'start': 40.669, 'title': 'Evolution of machine learning', 'summary': "Discusses the evolution of machine learning, from the development of neural networks to its application in solving real-world problems like zip code recognition and the triumph of machine learning in the development of watson, google's use of data analysis, and the accurate election predictions made by statisticians like nate silver.", 'duration': 145.825, 'highlights': ['Watson, a computer program built by IBM, triumphed in a three-game match against human players, showcasing the success of machine learning in artificial intelligence. Watson, developed by IBM, won a three-game match against human players, demonstrating the triumph of machine learning in artificial intelligence.', 'Nate Silver accurately predicted the outcome of the 2012 presidential and Senate elections using statistics and carefully sampled data. Nate Silver, a statistician, accurately predicted the outcome of the 2012 presidential and Senate elections using statistics and carefully sampled data.', "Google's chief economist, Hal Varian, predicted that statisticians would have the 'sexy job' in the next 10 years, emphasizing the growing importance of statistics in the tech industry. Hal Varian, Google's chief economist, predicted the growing importance of statisticians in the tech industry, emphasizing the 'sexy job' of statisticians in the next 10 years.", 'The development of machine learning, particularly neural networks, in the 1980s marked a significant advancement in the field of computer science. The 1980s saw the development of machine learning, particularly neural networks, which was a significant advancement in computer science.']}], 'duration': 186.158, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg336.jpg', 'highlights': ['Trevor Hastie and Rob Tibshirani introduce the first online course on statistical learning, expressing their excitement and nervousness about it.', 'Nate Silver, a statistician, accurately predicted the outcome of the 2012 presidential and Senate elections using statistics and carefully sampled data.', 'The development of machine learning, particularly neural networks, in the 1980s marked a significant advancement in the field of computer science.', "Hal Varian, Google's chief economist, predicted the growing importance of statisticians in the tech industry, emphasizing the 'sexy job' of statisticians in the next 10 years."]}, {'end': 560.287, 'segs': [{'end': 250.543, 'src': 'heatmap', 'start': 208.06, 'weight': 0.917, 'content': [{'end': 209.101, 'text': "It's a trendier word.", 'start': 208.06, 'duration': 1.041}, {'end': 214.124, 'text': "So we're going to run through a number of statistical learning problems.", 'start': 210.822, 'duration': 3.302}, {'end': 218.487, 'text': "You can see there's a bunch of examples on this page and we'll go through them one by one,", 'start': 215.125, 'duration': 3.362}, {'end': 222.829, 'text': "just to give you a flavor of what sorts of problems we're going to be thinking about.", 'start': 218.487, 'duration': 4.342}, {'end': 227.912, 'text': "So the first data set we're going to look at is on prostate cancer.", 'start': 223.59, 'duration': 4.322}, {'end': 240.14, 'text': 'This is a relatively small data set, 97 men, sampled from 97 men with prostate cancer, actually by a Stanford physician, Dr. Stamey, in the late 80s.', 'start': 228.353, 'duration': 11.787}, {'end': 250.543, 'text': 'And what we have is the PSA measurement for each subject, along with a number of clinical and blood measurements from the patients,', 'start': 241.497, 'duration': 9.046}], 'summary': 'Running through statistical learning problems with a dataset of 97 men with prostate cancer from the late 80s.', 'duration': 42.483, 'max_score': 208.06, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg208060.jpg'}, {'end': 250.543, 'src': 'embed', 'start': 218.487, 'weight': 1, 'content': [{'end': 222.829, 'text': "just to give you a flavor of what sorts of problems we're going to be thinking about.", 'start': 218.487, 'duration': 4.342}, {'end': 227.912, 'text': "So the first data set we're going to look at is on prostate cancer.", 'start': 223.59, 'duration': 4.322}, {'end': 240.14, 'text': 'This is a relatively small data set, 97 men, sampled from 97 men with prostate cancer, actually by a Stanford physician, Dr. Stamey, in the late 80s.', 'start': 228.353, 'duration': 11.787}, {'end': 250.543, 'text': 'And what we have is the PSA measurement for each subject, along with a number of clinical and blood measurements from the patients,', 'start': 241.497, 'duration': 9.046}], 'summary': 'A small dataset of 97 men with prostate cancer, including psa measurements and clinical data, will be analyzed.', 'duration': 32.056, 'max_score': 218.487, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg218487.jpg'}, {'end': 319.374, 'src': 'heatmap', 'start': 266.434, 'weight': 0, 'content': [{'end': 271.836, 'text': 'And you see on the diagonal is the name of each of the variables, and each little plot is a pair of variables.', 'start': 266.434, 'duration': 5.402}, {'end': 278.657, 'text': "So you get in one picture, if you've got a relatively small number of variables, you can see all the data at once in a picture like this.", 'start': 271.876, 'duration': 6.781}, {'end': 282.859, 'text': 'And you can see the nature of the data, what variables are correlated, and so on.', 'start': 278.718, 'duration': 4.141}, {'end': 286.4, 'text': 'And so this is a good way of getting a view of your data.', 'start': 283.259, 'duration': 3.141}, {'end': 295.777, 'text': 'And in this particular case, The goal was to try and predict the PSA from the other measurements.', 'start': 287.12, 'duration': 8.657}, {'end': 296.937, 'text': "So it's along the top.", 'start': 295.817, 'duration': 1.12}, {'end': 299.918, 'text': "And you can see there's some correlations between these measurements.", 'start': 296.997, 'duration': 2.921}, {'end': 310.741, 'text': "Here's actually another view of these data, which looks rather similar, except in the one instance over here, which is this is the log weight.", 'start': 301.459, 'duration': 9.282}, {'end': 312.321, 'text': 'These variables are on the log scale.', 'start': 310.781, 'duration': 1.54}, {'end': 313.902, 'text': 'And this is log weight.', 'start': 312.982, 'duration': 0.92}, {'end': 317.091, 'text': "And you notice there's a point over here.", 'start': 315.049, 'duration': 2.042}, {'end': 319.374, 'text': 'It looks like somewhat of an outlier.', 'start': 317.332, 'duration': 2.042}], 'summary': 'Analyzing variables and correlations to predict psa from measurements, with a focus on visualizing data.', 'duration': 47.468, 'max_score': 266.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg266434.jpg'}, {'end': 366.394, 'src': 'embed', 'start': 341.294, 'weight': 3, 'content': [{'end': 346.959, 'text': 'Well, we got a message from a retired urologist, Dr. Steven Link, who pointed this out to us.', 'start': 341.294, 'duration': 5.665}, {'end': 351.684, 'text': 'And so we corrected an earlier published version of this scatterplot.', 'start': 347.18, 'duration': 4.504}, {'end': 352.865, 'text': 'Which is a good thing to remember.', 'start': 352.024, 'duration': 0.841}, {'end': 356.708, 'text': 'The first thing to do when you get a set of data for analysis is not to run it through a fancy algorithm.', 'start': 352.905, 'duration': 3.803}, {'end': 358.17, 'text': 'Make some graphs, some plots.', 'start': 356.949, 'duration': 1.221}, {'end': 359.271, 'text': 'Look at the data.', 'start': 358.65, 'duration': 0.621}, {'end': 363.353, 'text': 'I think in the old days before computers, people did that much more because it was easy.', 'start': 359.751, 'duration': 3.602}, {'end': 366.394, 'text': 'I mean, you do it by hand, and the analysis took many, many hours.', 'start': 363.373, 'duration': 3.021}], 'summary': 'Retired urologist dr. steven link corrected scatterplot, emphasizing importance of visualizing data before analysis.', 'duration': 25.1, 'max_score': 341.294, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg341294.jpg'}, {'end': 412.587, 'src': 'heatmap', 'start': 368.355, 'weight': 1, 'content': [{'end': 373.937, 'text': 'And we needed to remember that, that even with big data, you should look at it first before you jump in with an analysis.', 'start': 368.355, 'duration': 5.582}, {'end': 379.579, 'text': 'So the next example is phonemes for two vowel sounds.', 'start': 375.057, 'duration': 4.522}, {'end': 384.601, 'text': 'And this is looking at, this graph has the log periodograms of phonemes.', 'start': 380.599, 'duration': 4.002}, {'end': 391.39, 'text': 'for two different phonemes, the power at different frequencies for two different phonemes, A and AO.', 'start': 385.166, 'duration': 6.224}, {'end': 395.013, 'text': 'How do you pronounce those, Trevor? A-A is odd, and AO is ought.', 'start': 391.43, 'duration': 3.583}, {'end': 398.915, 'text': "As you can tell, Trevor talks funny, but hopefully during the course you'll be able to..", 'start': 395.033, 'duration': 3.882}, {'end': 403.085, 'text': 'How could you say that? Odd and ought? OK.', 'start': 398.915, 'duration': 4.17}, {'end': 412.587, 'text': 'So you see the log periodograms at various frequencies of these two vowel sounds are spoken by different people, the orange and the green.', 'start': 403.606, 'duration': 8.981}], 'summary': 'Analyzing power at different frequencies of phonemes a and ao for speech recognition.', 'duration': 44.232, 'max_score': 368.355, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg368355.jpg'}, {'end': 451.2, 'src': 'embed', 'start': 425.15, 'weight': 4, 'content': [{'end': 430.712, 'text': 'looking at trying to classify the two classes from each other based on the power of different frequencies.', 'start': 425.15, 'duration': 5.562}, {'end': 438.215, 'text': 'The loaded model is from logistic regression, which is used to classify into one of the two vowel sounds based on the log periodogram.', 'start': 431.052, 'duration': 7.163}, {'end': 439.555, 'text': "And we'll cover it in detail in the course.", 'start': 438.255, 'duration': 1.3}, {'end': 448.499, 'text': 'And the estimated coefficients from the logistic model are in the gray profiles here in the bottom plot.', 'start': 440.116, 'duration': 8.383}, {'end': 451.2, 'text': "And you can see they're very non-smooth.", 'start': 448.579, 'duration': 2.621}], 'summary': 'Logistic regression model classifies vowel sounds based on frequency power.', 'duration': 26.05, 'max_score': 425.15, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg425150.jpg'}, {'end': 496.399, 'src': 'heatmap', 'start': 467.528, 'weight': 0.731, 'content': [{'end': 475.215, 'text': 'And the red curve shows you pretty clearly that the important frequencies looks like the one vowel sounds got more power around 25,', 'start': 467.528, 'duration': 7.687}, {'end': 481.929, 'text': 'and the other vowel sound has more power around just before 50..', 'start': 475.215, 'duration': 6.714}, {'end': 486.612, 'text': 'Predict whether someone will have a heart attack on the basis of demographic, diet, and clinical measurements.', 'start': 481.929, 'duration': 4.683}, {'end': 491.496, 'text': 'So these are some data on actually men from South Africa.', 'start': 487.473, 'duration': 4.023}, {'end': 496.399, 'text': "The red ones are those that had heart disease, and the blue points are those that didn't.", 'start': 492.156, 'duration': 4.243}], 'summary': 'Analysis of vowel sound frequencies and heart attack prediction based on demographic and clinical data.', 'duration': 28.871, 'max_score': 467.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg467528.jpg'}, {'end': 532.797, 'src': 'embed', 'start': 508.067, 'weight': 5, 'content': [{'end': 513.969, 'text': 'Now, when you have a binary response like this, you can color the scatterplot matrix so you can see the points, which is rather handy.', 'start': 508.067, 'duration': 5.902}, {'end': 520.871, 'text': 'And these data come from a region of South Africa where the risk of heart disease is very high.', 'start': 515.01, 'duration': 5.861}, {'end': 523.712, 'text': "It's over 5% for this age group.", 'start': 520.912, 'duration': 2.8}, {'end': 530.376, 'text': 'The people, especially men around, they eat lots of, these were men, they eat lots of meat.', 'start': 524.333, 'duration': 6.043}, {'end': 532.797, 'text': 'They have meat for all three meals.', 'start': 530.656, 'duration': 2.141}], 'summary': 'In south africa, the risk of heart disease for men is over 5% due to high meat consumption.', 'duration': 24.73, 'max_score': 508.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg508067.jpg'}], 'start': 186.814, 'title': 'Statistical learning and data visualization', 'summary': 'Introduces statistical learning using a small dataset of 97 men with prostate cancer to predict psa and emphasizes the importance of visualizing data with examples of phoneme classification and heart disease risk analysis.', 'chapters': [{'end': 340.594, 'start': 186.814, 'title': 'Statistical learning on prostate cancer data', 'summary': 'Introduces statistical learning problems using a small dataset of 97 men with prostate cancer, aiming to predict psa from other measurements and showcasing the use of scatterplot matrix for data visualization.', 'duration': 153.78, 'highlights': ['The scatterplot matrix provides a comprehensive view of the data, showcasing the relationships between variables, aiding in understanding correlations and identifying outliers.', 'The dataset consists of 97 men with prostate cancer, including PSA measurements, clinical and blood measurements, as well as data on cancer size and severity, sampled by Dr. Stamey in the late 80s.', 'The goal is to predict PSA from the given measurements, demonstrating the practical application of statistical learning techniques in the medical domain.']}, {'end': 560.287, 'start': 341.294, 'title': 'Importance of data visualization', 'summary': 'Emphasizes the importance of visualizing data before running complex algorithms, with examples of phoneme classification and heart disease risk analysis in south african men.', 'duration': 218.993, 'highlights': ['The chapter highlights the importance of visualizing data before running complex algorithms, emphasizing the need to make graphs and plots to understand the data first.', 'It discusses the example of classifying phonemes based on the power of different frequencies, using a logit model from logistic regression and the application of smoothing techniques to identify important frequencies.', 'The chapter also presents a case study on heart disease risk analysis in South African men, demonstrating the use of scatterplot matrix and the need to jointly involve different risk factors in developing a risk model.']}], 'duration': 373.473, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg186814.jpg', 'highlights': ['The scatterplot matrix provides a comprehensive view of the data, aiding in understanding correlations and identifying outliers.', 'The dataset consists of 97 men with prostate cancer, including PSA measurements, clinical and blood measurements, sampled by Dr. Stamey in the late 80s.', 'The goal is to predict PSA from the given measurements, demonstrating the practical application of statistical learning techniques in the medical domain.', 'The chapter highlights the importance of visualizing data before running complex algorithms, emphasizing the need to make graphs and plots to understand the data first.', 'It discusses the example of classifying phonemes based on the power of different frequencies, using a logit model from logistic regression and the application of smoothing techniques to identify important frequencies.', 'The chapter also presents a case study on heart disease risk analysis in South African men, demonstrating the use of scatterplot matrix and the need to jointly involve different risk factors in developing a risk model.']}, {'end': 1095.858, 'segs': [{'end': 605.127, 'src': 'embed', 'start': 562.734, 'weight': 0, 'content': [{'end': 564.977, 'text': 'Our next example is email spam detection.', 'start': 562.734, 'duration': 2.243}, {'end': 569.101, 'text': 'Everyone uses email, and spam is definitely a problem.', 'start': 566.759, 'duration': 2.342}, {'end': 573.827, 'text': 'And so spam filters are a very important application of Cisco machine learning.', 'start': 569.782, 'duration': 4.045}, {'end': 579.213, 'text': "The data on this table actually, I think it's from maybe the late 90s.", 'start': 574.448, 'duration': 4.765}, {'end': 582.815, 'text': 'Is that right? Yeah, late 90s, exactly.', 'start': 580.314, 'duration': 2.501}, {'end': 584.176, 'text': "It's from Hewlett-Packard.", 'start': 582.875, 'duration': 1.301}, {'end': 587.158, 'text': 'So this is a person named George who worked at Hewlett-Packard.', 'start': 584.196, 'duration': 2.962}, {'end': 591.82, 'text': 'So this was early in the days of email where, as well, spam was also not very sophisticated.', 'start': 587.238, 'duration': 4.582}, {'end': 597.363, 'text': 'So what we have here is data from over 4,000 emails sent to an individual named George at HP Labs.', 'start': 592.38, 'duration': 4.983}, {'end': 600.765, 'text': "Each one's been hand labeled as either being spam or good email.", 'start': 598.063, 'duration': 2.702}, {'end': 602.726, 'text': 'And the goal here is to try to predict.', 'start': 601.345, 'duration': 1.381}, {'end': 605.127, 'text': 'Actually, they call good email ham these days, Rob.', 'start': 603.126, 'duration': 2.001}], 'summary': "Cisco's machine learning used on 90s data to detect spam in over 4,000 emails at hp labs.", 'duration': 42.393, 'max_score': 562.734, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg562734.jpg'}, {'end': 695.404, 'src': 'embed', 'start': 666.999, 'weight': 2, 'content': [{'end': 682.206, 'text': "Right. So the goal was that and we'll talk about this example in detail to use the 57 features and here's these are seven of those features as a classifier together to try to predict whether an email is spam or ham.", 'start': 666.999, 'duration': 15.207}, {'end': 688.601, 'text': 'Identify the numbers in a handwritten zip code.', 'start': 686.42, 'duration': 2.181}, {'end': 690.622, 'text': 'This is what we were alluding to earlier.', 'start': 688.621, 'duration': 2.001}, {'end': 695.404, 'text': 'Here are some handwritten digits taken from envelopes.', 'start': 690.642, 'duration': 4.762}], 'summary': 'Using 57 features as a classifier to predict email spam or ham.', 'duration': 28.405, 'max_score': 666.999, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg666999.jpg'}, {'end': 888.381, 'src': 'embed', 'start': 863.677, 'weight': 3, 'content': [{'end': 871.106, 'text': "Trying to figure out the common patterns of gene expression for women with breast cancer and seeing why there's subcategories of breast cancer showing different gene expression.", 'start': 863.677, 'duration': 7.429}, {'end': 873.529, 'text': 'So we see here is a heat map of the full data.', 'start': 871.767, 'duration': 1.762}, {'end': 877.653, 'text': '88 women in the columns and about 8,000 genes in the rows.', 'start': 874.65, 'duration': 3.003}, {'end': 883.377, 'text': "And hierarchical clustering, which we'll discuss in the last part of this course, has been applied to the columns.", 'start': 878.393, 'duration': 4.984}, {'end': 888.381, 'text': 'And you see the clustering tree at the top here, which has been expanded for your view at the top.', 'start': 883.437, 'duration': 4.944}], 'summary': 'Analyzing gene expression patterns in 88 women with breast cancer using 8,000 genes and hierarchical clustering.', 'duration': 24.704, 'max_score': 863.677, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg863677.jpg'}, {'end': 997.873, 'src': 'embed', 'start': 970.286, 'weight': 4, 'content': [{'end': 974.912, 'text': 'And so here we see the three of the variables that affect income.', 'start': 970.286, 'duration': 4.626}, {'end': 983.222, 'text': "And again, the goal is we use regression models to try and understand the roles of these variables together and see if there's interactions and so on.", 'start': 975.473, 'duration': 7.749}, {'end': 991.972, 'text': 'And the last example is Landsat images of a land use area in Australia.', 'start': 985.25, 'duration': 6.722}, {'end': 993.792, 'text': 'So this is a rural area of Australia.', 'start': 991.992, 'duration': 1.8}, {'end': 995.333, 'text': 'Those are harsh colors, Rob.', 'start': 993.912, 'duration': 1.421}, {'end': 997.873, 'text': 'Did you choose those colors? You probably did, Trevor.', 'start': 995.473, 'duration': 2.4}], 'summary': 'Regression models analyze variables affecting income in rural australia.', 'duration': 27.587, 'max_score': 970.286, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg970286.jpg'}, {'end': 1095.858, 'src': 'embed', 'start': 1064.366, 'weight': 5, 'content': [{'end': 1069.61, 'text': 'Yeah Although we might want to use the fact that nearby pixels are more likely to be the same land use than ones that are far away.', 'start': 1064.366, 'duration': 5.244}, {'end': 1072.552, 'text': "And we'll talk about classifiers.", 'start': 1070.911, 'duration': 1.641}, {'end': 1077.313, 'text': 'I think the one we use here is actually Nearest Neighbors, a very simple classifier, and that produces the prediction in the bottom right.', 'start': 1072.752, 'duration': 4.561}, {'end': 1078.953, 'text': "And you can see it's quite good.", 'start': 1078.013, 'duration': 0.94}, {'end': 1079.693, 'text': "It's not perfect.", 'start': 1078.993, 'duration': 0.7}, {'end': 1083.435, 'text': "There's a few mistakes it makes, but it's for the most part quite accurate.", 'start': 1079.733, 'duration': 3.702}, {'end': 1087.216, 'text': "Okay, so that's the end of the series of examples.", 'start': 1084.135, 'duration': 3.081}, {'end': 1095.858, 'text': "In the next session, we'll just tell you some notation and how we set up problems for supervised learning, which we'll use for the rest of the course.", 'start': 1087.816, 'duration': 8.042}], 'summary': 'Nearest neighbors classifier is quite accurate, with a few mistakes, used for supervised learning.', 'duration': 31.492, 'max_score': 1064.366, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg1064366.jpg'}], 'start': 562.734, 'title': 'Machine learning applications', 'summary': 'Discusses various machine learning applications, including email spam detection with a dataset of over 4,000 emails, handwritten zip code classification, gene expression profiling in breast cancer, demographic variable analysis for income prediction, and land use prediction in rural australia using landsat images.', 'chapters': [{'end': 666.719, 'start': 562.734, 'title': 'Email spam detection', 'summary': 'Discusses the application of machine learning in email spam detection, using a dataset of over 4,000 emails from the late 90s to classify spam and good email based on word frequencies, aiming to predict and prevent spam, with a focus on the evolving sophistication of spam over time.', 'duration': 103.985, 'highlights': ['Spam filters as a crucial application of Cisco machine learning Spam filters are highlighted as a significant application of machine learning, emphasizing their importance in addressing the prevalent issue of email spam.', 'Dataset of over 4,000 emails from the late 90s for spam detection The dataset consists of over 4,000 emails, hand-labeled as spam or good email, providing a valuable resource for training and testing machine learning models for spam detection.', "Predicting spam or 'ham' (good email) based on word frequencies The goal is to classify spam from good email based on word frequencies, with a specific focus on the frequencies of words in the email to predict and differentiate between spam and good email.", 'Sophistication of spam over time The discussion highlights the evolving sophistication of spam over time, emphasizing how spammers have become more sophisticated in personalizing spam emails, making it more challenging to detect and prevent spam.']}, {'end': 1095.858, 'start': 666.999, 'title': 'Machine learning applications', 'summary': 'Discusses using machine learning to classify handwritten zip codes, gene expression profiling in breast cancer, demographic variable analysis for income prediction, and land use prediction in rural australia using landsat images.', 'duration': 428.859, 'highlights': ['Using 57 features, a classifier was used to predict whether an email is spam or ham, achieving a low error rate of 4-5%. A classifier was developed using 57 features to predict whether an email is spam or ham, with an error rate of about 4-5%.', 'Gene expression profiling was used to classify breast cancer patients into subcategories based on common patterns of gene expression, resulting in hierarchical clustering to divide patients into roughly six subgroups. Gene expression profiling was utilized to classify breast cancer patients into subcategories, resulting in hierarchical clustering to divide patients into roughly six subgroups.', 'Regression models were used to understand the relationship between income and demographic variables, showing how income changes with age, education level, and year. Regression models were employed to understand the relationship between income and demographic variables, showcasing the changes in income with age, education level, and year.', 'Landsat images were analyzed to predict land use in a rural area of Australia, using features from spectral bands and a Nearest Neighbors classifier, achieving mostly accurate predictions. Landsat images were analyzed to predict land use in a rural area of Australia, utilizing features from spectral bands and a Nearest Neighbors classifier, resulting in mostly accurate predictions.']}], 'duration': 533.124, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5N9V07EIfIg/pics/5N9V07EIfIg562734.jpg', 'highlights': ['Spam filters as a crucial application of Cisco machine learning', 'Dataset of over 4,000 emails from the late 90s for spam detection', 'Using 57 features, a classifier was used to predict whether an email is spam or ham, achieving a low error rate of 4-5%', 'Gene expression profiling was used to classify breast cancer patients into subcategories based on common patterns of gene expression, resulting in hierarchical clustering to divide patients into roughly six subgroups', 'Regression models were used to understand the relationship between income and demographic variables, showing how income changes with age, education level, and year', 'Landsat images were analyzed to predict land use in a rural area of Australia, using features from spectral bands and a Nearest Neighbors classifier, achieving mostly accurate predictions']}], 'highlights': ['The development of machine learning, particularly neural networks, in the 1980s marked a significant advancement in the field of computer science.', 'Trevor Hastie and Rob Tibshirani introduce the first online course on statistical learning, expressing their excitement and nervousness about it.', 'Nate Silver, a statistician, accurately predicted the outcome of the 2012 presidential and Senate elections using statistics and carefully sampled data.', "Hal Varian, Google's chief economist, predicted the growing importance of statisticians in the tech industry, emphasizing the 'sexy job' of statisticians in the next 10 years.", 'The scatterplot matrix provides a comprehensive view of the data, aiding in understanding correlations and identifying outliers.', 'The dataset consists of 97 men with prostate cancer, including PSA measurements, clinical and blood measurements, sampled by Dr. Stamey in the late 80s.', 'The goal is to predict PSA from the given measurements, demonstrating the practical application of statistical learning techniques in the medical domain.', 'Spam filters as a crucial application of Cisco machine learning', 'Using 57 features, a classifier was used to predict whether an email is spam or ham, achieving a low error rate of 4-5%', 'Landsat images were analyzed to predict land use in a rural area of Australia, using features from spectral bands and a Nearest Neighbors classifier, achieving mostly accurate predictions']}