title
Building Software Systems At Google and Lessons Learned

description
(November 10, 2010) Speaker Jeffrey Dean describes some of his experiences at Google and the types of technology being used today. He talks about how their technology has evolved over time and how their technological infrastructure has allowed them to be so successful. Stanford University: http://www.stanford.edu/ School of Engineering: http://soe.stanford.edu/ Stanford Center for Professional Development: http://scpd.stanford.edu/ Stanford University Channel on YouTube: http://www.youtube.com/stanford

detail
{'title': 'Building Software Systems At Google and Lessons Learned', 'heatmap': [{'end': 1545.543, 'start': 1487.563, 'weight': 1}, {'end': 4370.423, 'start': 4317.607, 'weight': 0.721}], 'summary': "Jeff dean, a speaker from google, discusses the evolution of google's systems, including a 1,000x improvement in computational power and a 5x improvement in response time, challenges in scaling data centers, design of in-memory indexing system, handling server crashes, reliability, and availability in software, google file system (gfs) overview, data processing techniques, google's spanner system, system design efficiency, and data center management challenges.", 'chapters': [{'end': 66.919, 'segs': [{'end': 48.97, 'src': 'embed', 'start': 4.957, 'weight': 0, 'content': [{'end': 5.557, 'text': 'Okay Can you hear me? Okay.', 'start': 4.957, 'duration': 0.6}, {'end': 35.206, 'text': "OK, welcome to, I guess this is EE380, but it's also been sort of overridden with our distinguished lecture series.", 'start': 28.384, 'duration': 6.822}, {'end': 38.607, 'text': "Today's speaker is Jeff Dean of Google.", 'start': 36.306, 'duration': 2.301}, {'end': 44.949, 'text': "And Jeff, I don't want to use up all his time in describing all his accomplishments.", 'start': 39.007, 'duration': 5.942}, {'end': 48.97, 'text': "But I'll say he did get his degree at University of Washington.", 'start': 45.009, 'duration': 3.961}], 'summary': 'Jeff dean from google is the speaker for ee380 lecture series.', 'duration': 44.013, 'max_score': 4.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4957.jpg'}], 'start': 4.957, 'title': "Jeff dean's lecture", 'summary': 'Features jeff dean, a speaker from google, at the ee380 lecture series, highlighting his background and the transition to google in its early days.', 'chapters': [{'end': 66.919, 'start': 4.957, 'title': "Jeff dean's lecture at ee380", 'summary': 'Features jeff dean, a speaker from google, at the ee380 lecture series, highlighting his background and the transition to google in its early days.', 'duration': 61.962, 'highlights': ['Jeff Dean is the speaker at the EE380 lecture series, representing Google.', 'Jeff Dean obtained his degree at the University of Washington, and he transitioned to Google in its early days after working at DEC research lab.']}], 'duration': 61.962, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4957.jpg', 'highlights': ['Jeff Dean is the speaker at the EE380 lecture series, representing Google.', 'Jeff Dean obtained his degree at the University of Washington, and he transitioned to Google in its early days after working at DEC research lab.']}, {'end': 884.287, 'segs': [{'end': 210.029, 'src': 'embed', 'start': 175.588, 'weight': 1, 'content': [{'end': 178.79, 'text': 'And that scale is increased by about a factor of 1, 000 from 1999 to today.', 'start': 175.588, 'duration': 3.202}, {'end': 186.825, 'text': 'is another important dimension is how many queries you have to handle on a given day.', 'start': 183.121, 'duration': 3.704}, {'end': 188.467, 'text': "And that's also grown by a factor of 1, 000.", 'start': 187.126, 'duration': 1.341}, {'end': 190.529, 'text': 'I have a quick experiment.', 'start': 188.467, 'duration': 2.062}, {'end': 197.217, 'text': 'How many of you use Google regularly in 1999? Yeah, this may be a biased audience.', 'start': 190.729, 'duration': 6.488}, {'end': 210.029, 'text': "How about 2002? How about now? OK, yeah, well, so our traffic has grown, but I don't see 1, 000 more hams than I did earlier.", 'start': 197.257, 'duration': 12.772}], 'summary': "Google's scale has increased by a factor of 1,000 in queries and traffic since 1999.", 'duration': 34.441, 'max_score': 175.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI175588.jpg'}, {'end': 317.623, 'src': 'embed', 'start': 239.752, 'weight': 0, 'content': [{'end': 246.175, 'text': 'In 1999, we were basically updating our index once a month if we were lucky and once every couple months if something horrible went wrong.', 'start': 239.752, 'duration': 6.423}, {'end': 252.379, 'text': 'And now we have portions of our index we can update within a matter of seconds from calling the page.', 'start': 247.016, 'duration': 5.363}, {'end': 254.36, 'text': "And so that's a pretty substantial improvement.", 'start': 252.639, 'duration': 1.721}, {'end': 260.942, 'text': 'The other important factor for users is, how quickly do you get your responses? This is measured at the server side.', 'start': 255.3, 'duration': 5.642}, {'end': 262.924, 'text': "It doesn't include client side network latency.", 'start': 260.963, 'duration': 1.961}, {'end': 266.445, 'text': "But basically, we've had about a 5x improvement in there.", 'start': 262.984, 'duration': 3.461}, {'end': 273.229, 'text': 'So the difficulty in engineering a retrieval system is, in some sense, the product of all these things.', 'start': 267.926, 'duration': 5.303}, {'end': 275.269, 'text': "Because you're dealing with larger indices.", 'start': 273.329, 'duration': 1.94}, {'end': 279.011, 'text': "You're trying to handle more queries with more information per document.", 'start': 275.29, 'duration': 3.721}, {'end': 280.772, 'text': "And you're trying to update it more often.", 'start': 279.451, 'duration': 1.321}, {'end': 282.193, 'text': "And you're trying to do it faster.", 'start': 281.112, 'duration': 1.081}, {'end': 291.159, 'text': "Now, one thing that's really helped us a lot is that we've been able to use more machines and faster machines since 1999.", 'start': 283.173, 'duration': 7.986}, {'end': 292.74, 'text': "We've upgraded our hardware a bit.", 'start': 291.159, 'duration': 1.581}, {'end': 297.643, 'text': "And that's given us about 1, 000x improvement in computational oomph.", 'start': 293.961, 'duration': 3.682}, {'end': 305.678, 'text': 'The other kind of cool thing about working on search is that a lot of the stuff kind of happens behind the scenes.', 'start': 300.336, 'duration': 5.342}, {'end': 311.1, 'text': "And we don't necessarily change the user interface every time we change kind of how the guts of our search system work.", 'start': 306.058, 'duration': 5.042}, {'end': 317.623, 'text': "And so over the last 11 years, we've rolled out about seven very significant revisions to how our search system work.", 'start': 311.52, 'duration': 6.103}], 'summary': 'Search index updates improved by 5x, hardware upgrade led to 1000x computational improvement, and 7 significant revisions in 11 years.', 'duration': 77.871, 'max_score': 239.752, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI239752.jpg'}, {'end': 534.602, 'src': 'embed', 'start': 508.194, 'weight': 6, 'content': [{'end': 512.315, 'text': 'OK, so now that Google was a real company, we decided we should build our own hardware.', 'start': 508.194, 'duration': 4.121}, {'end': 522.938, 'text': 'And we were going to live off the commodity hardware curve that was driving a lot of the plummeting prices for desktop computers.', 'start': 512.955, 'duration': 9.983}, {'end': 528.4, 'text': 'So we actually decided that we would build our own hardware from these commodity components.', 'start': 523.499, 'duration': 4.901}, {'end': 534.602, 'text': 'The commodity components were great because they were really low price for what you got.', 'start': 529.52, 'duration': 5.082}], 'summary': 'Google decided to build hardware using low-cost commodity components.', 'duration': 26.408, 'max_score': 508.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI508194.jpg'}, {'end': 636.603, 'src': 'embed', 'start': 599.283, 'weight': 7, 'content': [{'end': 600.964, 'text': "So I said, OK, let's write an ad system.", 'start': 599.283, 'duration': 1.681}, {'end': 606.326, 'text': "So I'm not going to talk much about it in the rest of the talk, but there's all kinds of interesting features in the ad system.", 'start': 601.044, 'duration': 5.282}, {'end': 613.049, 'text': 'You can view the advertising system as really another form of information retrieval, with some additional kinds of constraints,', 'start': 606.906, 'duration': 6.143}, {'end': 617.811, 'text': 'like budgets for advertisers and cost per click, metrics and so on.', 'start': 613.049, 'duration': 4.762}, {'end': 620.112, 'text': "But I'm not going to really dwell on it in this talk.", 'start': 617.911, 'duration': 2.201}, {'end': 627.096, 'text': 'And in order to add more search capacity in a system,', 'start': 622.453, 'duration': 4.643}, {'end': 636.603, 'text': 'you essentially take the index data and you replicate it so that you have a whole bunch of machines that can deal with index shard 0 and a whole bunch of machines that can deal with index shard 1..', 'start': 627.096, 'duration': 9.507}], 'summary': 'Ad system features include budget constraints and cost per click metrics, expanding search capacity through index replication.', 'duration': 37.32, 'max_score': 599.283, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI599283.jpg'}, {'end': 838.46, 'src': 'embed', 'start': 813.105, 'weight': 8, 'content': [{'end': 818.848, 'text': 'And you would essentially then end up with data structures that you could use to serve the index shards.', 'start': 813.105, 'duration': 5.743}, {'end': 823.411, 'text': "We didn't actually have checksumming of the raw data.", 'start': 821.11, 'duration': 2.301}, {'end': 827.814, 'text': "And the machines we bought at that time consumer class machines typically didn't have.", 'start': 823.591, 'duration': 4.223}, {'end': 832.056, 'text': "not only didn't they have ECC, they didn't even have parity in their memory.", 'start': 827.814, 'duration': 4.242}, {'end': 838.46, 'text': "So a rather frustrating thing when you're sorting a terabyte of data without any parity is that it ends up mostly sorted.", 'start': 833.037, 'duration': 5.423}], 'summary': 'Data structures created for serving index shards; lack of checksumming and parity in consumer-class machines led to frustration while sorting terabytes of data.', 'duration': 25.355, 'max_score': 813.105, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI813105.jpg'}], 'start': 66.919, 'title': "Evolution of google's systems and building own hardware", 'summary': "Details the evolution of google's systems, showcasing a 1,000x improvement in computational power and a 5x improvement in response time, and discusses google's decision to build their own hardware, challenges faced in the hardware design, development of an ad system, and implementation of caching for web search with insights on cache hit rates and query latency.", 'chapters': [{'end': 505.466, 'start': 66.919, 'title': "Evolution of google's systems", 'summary': "Details the evolution of google's systems, including the growth in indexing documents and handling queries, the increase in update latency and response time, the improvement in computational power, and the significant revisions to the search system over 11 years, showcasing a 1,000x improvement in computational oomph and a 5x improvement in response time.", 'duration': 438.547, 'highlights': ["The update latency of Google's index has substantially improved, with portions of the index now being updated within a matter of seconds from calling the page.", 'Google has experienced about a 5x improvement in response time for users, measured at the server side, over the years.', "The amount of information kept in Google's index today is about three times as much as it was per document in 1999.", 'The scale of documents indexed by Google has increased by about a factor of 1,000 from 1999 to today.', 'The traffic and queries handled by Google have also grown by a factor of 1,000 over the years.', 'Google has achieved a 1,000x improvement in computational oomph by using more and faster machines since 1999.', 'Over the last 11 years, Google has rolled out about seven significant revisions to its search system, resulting in a 1,000x improvement in computational oomph and a 5x improvement in response time.', 'One of the early innovations by Google was the query-dependent snippet generation, using the words in the query to decide the summary of the document shown in the search results.']}, {'end': 884.287, 'start': 508.194, 'title': 'Building own hardware and improving search capacity', 'summary': "Discusses google's decision to build their own hardware using commodity components, challenges faced in the hardware design, the development of an ad system, and the implementation of caching for web search with insights on cache hit rates and query latency.", 'duration': 376.093, 'highlights': ["Google's decision to build their own hardware using commodity components Google decided to build their own hardware to live off the commodity hardware curve and bought components at low prices, assembled them, and faced challenges with the design including shared power supply and failure modes.", "Development of an ad system and search capacity improvement Google's early focus on developing an ad system to improve search capacity by replicating index data and introducing cache servers for web search, resulting in cache hit rates ranging from 30% to 60% and lower query latency.", 'Challenges faced in indexing system design and data sorting Challenges faced in the indexing system design including the absence of ECC and parity in consumer-class machines, leading to difficulties in data sorting and the development of a file abstraction with checksums for data integrity.']}], 'duration': 817.368, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI66919.jpg', 'highlights': ['Google has achieved a 1,000x improvement in computational oomph by using more and faster machines since 1999.', 'The scale of documents indexed by Google has increased by about a factor of 1,000 from 1999 to today.', 'The traffic and queries handled by Google have also grown by a factor of 1,000 over the years.', 'Google has experienced about a 5x improvement in response time for users, measured at the server side, over the years.', "The update latency of Google's index has substantially improved, with portions of the index now being updated within a matter of seconds from calling the page.", 'Google has rolled out about seven significant revisions to its search system, resulting in a 1,000x improvement in computational oomph and a 5x improvement in response time over the last 11 years.', "Google's decision to build their own hardware using commodity components to live off the commodity hardware curve and bought components at low prices, assembled them, and faced challenges with the design including shared power supply and failure modes.", "Development of an ad system and search capacity improvement Google's early focus on developing an ad system to improve search capacity by replicating index data and introducing cache servers for web search, resulting in cache hit rates ranging from 30% to 60% and lower query latency.", 'Challenges faced in indexing system design and data sorting Challenges faced in the indexing system design including the absence of ECC and parity in consumer-class machines, leading to difficulties in data sorting and the development of a file abstraction with checksums for data integrity.']}, {'end': 1480.539, 'segs': [{'end': 990.818, 'src': 'embed', 'start': 935.322, 'weight': 0, 'content': [{'end': 939.624, 'text': 'So our incentives were to pack as many machines as we possibly could into these square feet.', 'start': 935.322, 'duration': 4.302}, {'end': 942.806, 'text': 'And we often had to help them a little bit with some cooling.', 'start': 940.485, 'duration': 2.321}, {'end': 953.544, 'text': 'As a consequence, we actually got pretty good at moving out of bankrupt service providers and into other ones.', 'start': 945.547, 'duration': 7.997}, {'end': 954.825, 'text': 'You can get pretty efficient at it.', 'start': 953.584, 'duration': 1.241}, {'end': 957.306, 'text': 'You can have the racks all ready to go.', 'start': 954.885, 'duration': 2.421}, {'end': 963.05, 'text': 'You just wheel them in, and then you just kind of cable together the top of rack switches, and away you go.', 'start': 957.346, 'duration': 5.704}, {'end': 978.605, 'text': 'So one of the important things in this period of 99 to 2001 is we were really growing two of those dimensions at the same time.', 'start': 963.07, 'duration': 15.535}, {'end': 982.93, 'text': 'The index grew by a factor of 20 or so over that period.', 'start': 978.625, 'duration': 4.305}, {'end': 990.818, 'text': 'At the same time we were getting 15% 20% traffic increases per month and signing big deals.', 'start': 983.35, 'duration': 7.468}], 'summary': 'Efficiently packed machines, 20x index growth, 15-20% traffic increase per month', 'duration': 55.496, 'max_score': 935.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI935322.jpg'}, {'end': 1246.347, 'src': 'embed', 'start': 1219.32, 'weight': 2, 'content': [{'end': 1226.805, 'text': "And you also get a very big decrease in query latency, because you're not waiting for mechanical disks, and especially at the tail.", 'start': 1219.32, 'duration': 7.485}, {'end': 1231.25, 'text': 'Really expensive queries in a disk-based indexing system.', 'start': 1228.406, 'duration': 2.844}, {'end': 1235.315, 'text': 'This was our canonical example that caused us all kinds of headaches.', 'start': 1232.191, 'duration': 3.124}, {'end': 1239.539, 'text': 'We were looking through query logs and trying to find what queries were really expensive.', 'start': 1235.935, 'duration': 3.604}, {'end': 1246.347, 'text': 'And this one was just orders of magnitude more than the rest of the queries in the sample of queries we were looking at.', 'start': 1239.58, 'duration': 6.767}], 'summary': 'Switching to disk-based indexing system led to significant decrease in query latency, particularly for very expensive queries, which were orders of magnitude slower than the rest.', 'duration': 27.027, 'max_score': 1219.32, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1219320.jpg'}, {'end': 1319.393, 'src': 'embed', 'start': 1285.351, 'weight': 3, 'content': [{'end': 1289.352, 'text': 'You just kind of move to a different position in the posting list in memory and away you go.', 'start': 1285.351, 'duration': 4.001}, {'end': 1297.255, 'text': 'Now, when you have the index system in memory, there are two main issues that really kind of bite you,', 'start': 1290.712, 'duration': 6.543}, {'end': 1299.275, 'text': "as you're first starting to deploy such a system.", 'start': 1297.255, 'duration': 2.02}, {'end': 1300.556, 'text': 'One is variance.', 'start': 1299.815, 'duration': 0.741}, {'end': 1304.497, 'text': 'So the query is now going to touch thousands of machines, not just dozens.', 'start': 1301.256, 'duration': 3.241}, {'end': 1310.346, 'text': 'um, and so things like randomized cron jobs can cause you trouble.', 'start': 1305.482, 'duration': 4.864}, {'end': 1311.827, 'text': 'and the reason?', 'start': 1310.346, 'duration': 1.481}, {'end': 1319.393, 'text': "uh, our operations folks had decided that we're going to have cron jobs that run every five minutes on the machine to do various kinds of housekeeping things,", 'start': 1311.827, 'duration': 7.566}], 'summary': 'Moving to index system in memory raises variance and increases machine touchpoints', 'duration': 34.042, 'max_score': 1285.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1285351.jpg'}], 'start': 885.408, 'title': 'Scaling data centers and in-memory indexing system design', 'summary': 'Details the challenges and strategies in scaling data centers and index size, highlighting the dense computational power per rack, 20x growth in index size, 15-20% monthly traffic increase, and the need for software improvements. additionally, it explains the design of an in-memory indexing system, emphasizing increased throughput and decreased query latency while addressing challenges of variance and availability, along with strategies to mitigate potential issues.', 'chapters': [{'end': 1154.975, 'start': 885.408, 'title': 'Scaling data centers and index size', 'summary': 'Details the challenges and strategies in scaling data centers and index size, highlighting the dense computational power per rack, 20x growth in index size, 15-20% monthly traffic increase, and the need for software improvements to handle traffic surge.', 'duration': 269.567, 'highlights': ['The index grew by a factor of 20 or so over that period, with 15-20% traffic increases per month and signing big deals, including a deal with Yahoo that doubled the traffic overnight.', 'Challenges included the need to pack as many machines as possible into hosting centers charged by square footage and the need for software improvements to handle the surge in traffic.', 'Constant increase in index size required fine partitioning to maintain response times, necessitating the addition of more index partitions and replicas as traffic grew.', 'Challenges with index sharding included disk seek limitations and the need for tricks such as building auxiliary data structures and index compression to optimize performance with increasing machines.']}, {'end': 1480.539, 'start': 1154.975, 'title': 'In-memory indexing system design', 'summary': 'Explains the design of an in-memory indexing system, highlighting its benefits of increased throughput and decreased query latency while addressing challenges of variance and availability, along with strategies to mitigate potential issues.', 'duration': 325.564, 'highlights': ['In-memory index system provides a big increase in throughput and a very big decrease in query latency. The system offers a significant improvement in throughput due to the elimination of disk seeks, resulting in a substantial decrease in query latency, especially for expensive queries in a disk-based indexing system.', 'Challenges of variance and availability are encountered in deploying an in-memory indexing system. Deploying an in-memory indexing system introduces challenges related to variance, where queries touch thousands of machines, and availability, due to the reduction in replicas and potential occurrences of machine failures.', 'Strategies for mitigating issues include optimizing cron job scheduling and implementing canary requests for detecting potential query failures. Optimizing cron job scheduling by avoiding randomization and implementing canary requests for detecting potential query failures are essential strategies for mitigating issues in an in-memory indexing system.']}], 'duration': 595.131, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI885408.jpg', 'highlights': ['20x growth in index size with 15-20% monthly traffic increase', 'Challenges in packing as many machines as possible into hosting centers', 'In-memory indexing system provides significant increase in throughput and decrease in query latency', 'Challenges of variance and availability in deploying in-memory indexing system', 'Optimizing cron job scheduling and implementing canary requests are essential strategies for mitigating issues']}, {'end': 2213.815, 'segs': [{'end': 1545.543, 'src': 'heatmap', 'start': 1481.2, 'weight': 0, 'content': [{'end': 1487.162, 'text': 'And the nice thing is you then crash only a few servers, not thousands of them.', 'start': 1481.2, 'duration': 5.962}, {'end': 1492.105, 'text': "It's also a good idea to log what the query was there so that you can then investigate further.", 'start': 1487.563, 'duration': 4.542}, {'end': 1496.282, 'text': "Oh, we didn't really have one.", 'start': 1495.402, 'duration': 0.88}, {'end': 1502.725, 'text': 'I mean, we had a kind of a dynamic one in memory that we would keep in the balancers so that you would only crash a few backends.', 'start': 1496.302, 'duration': 6.423}, {'end': 1507.067, 'text': "And then if the user hit reload, then you'd say, OK, I'm going to reject it beforehand.", 'start': 1502.765, 'duration': 4.302}, {'end': 1510.864, 'text': 'No, not usually.', 'start': 1509.884, 'duration': 0.98}, {'end': 1515.265, 'text': "It's just some bug in some new release you've rolled out.", 'start': 1511.064, 'duration': 4.201}, {'end': 1518.446, 'text': 'You also remember rolling out new releases really fast.', 'start': 1515.645, 'duration': 2.801}, {'end': 1524.487, 'text': "And you've replayed all of last month's logs or a large fraction of last month's logs to make sure it doesn't crash.", 'start': 1519.006, 'duration': 5.481}, {'end': 1529.048, 'text': "But you're always getting new kinds of queries and things like that.", 'start': 1525.007, 'duration': 4.041}, {'end': 1535.99, 'text': 'So we kind of caught our breath a bit and then redesigned things a fair amount.', 'start': 1529.888, 'duration': 6.102}, {'end': 1545.543, 'text': '2004,. this was kind of the first time we were able to rethink things from scratch.', 'start': 1540.84, 'duration': 4.703}], 'summary': 'Implemented measures reduced server crashes to a few, and redesigned system in 2004.', 'duration': 43.287, 'max_score': 1481.2, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1481200.jpg'}, {'end': 1579.449, 'src': 'embed', 'start': 1552.806, 'weight': 6, 'content': [{'end': 1561.009, 'text': 'So we now have a multi-level tree for query distributions, kind of generalized from what we had before, where we had a fixed number of levels.', 'start': 1552.806, 'duration': 8.203}, {'end': 1567.921, 'text': 'We have the least servers that are able to handle both index and doc requests.', 'start': 1563.254, 'duration': 4.667}, {'end': 1579.449, 'text': 'We have a thing called a repository manager sitting on the side that deals with this index is made up of a whole bunch of shards, thousands of shards.', 'start': 1570.365, 'duration': 9.084}], 'summary': 'Multi-level query distribution tree, least servers handle both index and doc requests, repository manager manages thousands of shards.', 'duration': 26.643, 'max_score': 1552.806, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1552806.jpg'}, {'end': 1626.002, 'src': 'embed', 'start': 1600.996, 'weight': 7, 'content': [{'end': 1608.103, 'text': 'Yeah So we were able to, at that point, clean up a lot of the abstractions that had evolved over time.', 'start': 1600.996, 'duration': 7.107}, {'end': 1612.547, 'text': 'One of the things we wanted in this system was the ability to do easy experiments.', 'start': 1609.084, 'duration': 3.463}, {'end': 1617.733, 'text': 'So when you have a new ranking idea or a new ranking algorithm you want to try,', 'start': 1613.088, 'duration': 4.645}, {'end': 1623.619, 'text': 'often you need to access some new kind of information that you want to pre-compute.', 'start': 1617.733, 'duration': 5.886}, {'end': 1626.002, 'text': 'for every document in the index.', 'start': 1624.421, 'duration': 1.581}], 'summary': 'Cleaned up abstractions, enabled easy experiments with new ranking algorithms.', 'duration': 25.006, 'max_score': 1600.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1600996.jpg'}, {'end': 1956.471, 'src': 'embed', 'start': 1929.025, 'weight': 4, 'content': [{'end': 1933.169, 'text': 'But it would cost you 3 to 4x in memory, probably.', 'start': 1929.025, 'duration': 4.144}, {'end': 1939.36, 'text': 'So then the system obviously has continued to evolve.', 'start': 1937.179, 'duration': 2.181}, {'end': 1943.963, 'text': 'One of the major changes we made in 2007 was this notion of universal search.', 'start': 1939.52, 'duration': 4.443}, {'end': 1947.145, 'text': 'So previously, we would just search web documents when you went to google.com.', 'start': 1944.063, 'duration': 3.082}, {'end': 1954.59, 'text': "And you'd have to go to these other properties that we developed over time, like books.google.com and news.google.com,", 'start': 1947.906, 'duration': 6.684}, {'end': 1956.471, 'text': 'to search these other kinds of corpora.', 'start': 1954.59, 'duration': 1.881}], 'summary': "Google's system evolved with universal search, increasing memory cost 3 to 4x.", 'duration': 27.446, 'max_score': 1929.025, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1929025.jpg'}, {'end': 2091.802, 'src': 'embed', 'start': 2060.645, 'weight': 5, 'content': [{'end': 2063.607, 'text': 'And actually, it turns out for different kinds of documents?', 'start': 2060.645, 'duration': 2.962}, {'end': 2069.792, 'text': 'the answer is we do both, depending on the kind of document.', 'start': 2063.607, 'duration': 6.185}, {'end': 2077.417, 'text': "OK So I'm going to switch gears a bit and talk about how our underlying system infrastructure has evolved.", 'start': 2071.012, 'duration': 6.405}, {'end': 2086.498, 'text': 'You know, this is a fairly modern incarnation of our rack design.', 'start': 2081.274, 'duration': 5.224}, {'end': 2088.88, 'text': "We're back to not having cases on our machines.", 'start': 2086.558, 'duration': 2.322}, {'end': 2091.802, 'text': 'You get better airflow, as it turns out.', 'start': 2090.341, 'duration': 1.461}], 'summary': 'System infrastructure has evolved; modern rack design improves airflow.', 'duration': 31.157, 'max_score': 2060.645, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2060645.jpg'}], 'start': 1481.2, 'title': 'Server crashes, query logging, and google system evolution', 'summary': "Discusses handling server crashes by implementing a dynamic memory-based query system, the importance of logging to prevent future crashes, and the evolution of google's system infrastructure including key changes in 2004 and 2007, addressing performance challenges and ui issues.", 'chapters': [{'end': 1524.487, 'start': 1481.2, 'title': 'Handling server crashes and query logging', 'summary': 'Discusses handling server crashes by implementing a dynamic memory-based query system to minimize the impact of crashes, and the importance of logging and replaying logs to prevent future crashes when rolling out new releases.', 'duration': 43.287, 'highlights': ['Implementing a dynamic memory-based query system to minimize the impact of server crashes.', 'The importance of logging queries to investigate further and prevent future crashes.', "Replaying last month's logs to ensure new releases do not cause crashes.", 'Fast rollout of new releases to address bugs and issues.']}, {'end': 2213.815, 'start': 1525.007, 'title': 'Google system evolution', 'summary': "Discusses the evolution of google's system infrastructure, including the redesign of index servers and doc servers in 2004, the implementation of a multi-level tree for query distributions, the introduction of a new ranking algorithm experimentation process, and the transition to universal search in 2007, addressing performance challenges and ui issues.", 'duration': 688.808, 'highlights': ['The redesign of index servers and doc servers in 2004, including the implementation of a multi-level tree for query distributions and the introduction of a repository manager for handling index shards, allowing for a gradual index switch process and caching of partial results. Redesign of index and doc servers in 2004, implementation of multi-level tree for query distributions, introduction of repository manager, gradual index switch process, caching of partial results.', 'The development of a new ranking algorithm experimentation process, allowing for easy experiments and pre-computation of new information for every document in the index, leading to higher performance and the ability to roll out new ranking algorithms based on traffic fractions. Development of new ranking algorithm experimentation process, easy experiments, pre-computation of new information, higher performance, roll out of new ranking algorithms based on traffic fractions.', 'The transition to universal search in 2007, involving the search of different properties at full web traffic levels, addressing performance challenges, and UI issues related to organizing results from different corpora. Transition to universal search in 2007, search of different properties at full web traffic levels, addressing performance challenges, UI issues related to organizing results.', "The evolution of Google's system infrastructure, including the modern incarnation of rack design, dealing with various issues such as individual machine failures, disk drive failures, and long distance link challenges. Evolution of Google's system infrastructure, modern rack design, dealing with machine and disk drive failures, long distance link challenges."]}], 'duration': 732.615, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI1481200.jpg', 'highlights': ['Implementing a dynamic memory-based query system to minimize the impact of server crashes.', 'The importance of logging queries to investigate further and prevent future crashes.', "Replaying last month's logs to ensure new releases do not cause crashes.", 'Fast rollout of new releases to address bugs and issues.', 'The transition to universal search in 2007, involving the search of different properties at full web traffic levels, addressing performance challenges, and UI issues related to organizing results.', "The evolution of Google's system infrastructure, including the modern incarnation of rack design, dealing with various issues such as individual machine failures, disk drive failures, and long distance link challenges.", 'The redesign of index servers and doc servers in 2004, including the implementation of a multi-level tree for query distributions and the introduction of a repository manager for handling index shards, allowing for a gradual index switch process and caching of partial results.', 'The development of a new ranking algorithm experimentation process, allowing for easy experiments and pre-computation of new information for every document in the index, leading to higher performance and the ability to roll out new ranking algorithms based on traffic fractions.']}, {'end': 2555.91, 'segs': [{'end': 2310.396, 'src': 'embed', 'start': 2218.382, 'weight': 2, 'content': [{'end': 2224.689, 'text': 'So in this kind of environment, you really have to have the reliability and availability come from the software, not from the hardware.', 'start': 2218.382, 'duration': 6.307}, {'end': 2232.718, 'text': "And even if you were to spend more money and buy more reliable hardware, at the scale that we're operating, that hardware is still going to fail.", 'start': 2225.33, 'duration': 7.388}, {'end': 2237.364, 'text': 'that reliability and availability to come from the software.', 'start': 2234.58, 'duration': 2.784}, {'end': 2243.191, 'text': "And so I'd actually much rather have three times as many machines that are less reliable,", 'start': 2237.704, 'duration': 5.487}, {'end': 2248.939, 'text': 'because you get a lot more computing oomph per dollar that way.', 'start': 2243.191, 'duration': 5.748}, {'end': 2259.047, 'text': "OK, but assuming you have a lot of machines and you're running in this environment, there are a few things you'd like to be able to do.", 'start': 2252.225, 'duration': 6.822}, {'end': 2263.969, 'text': "One is you'd like to be able to store data persistently with high availability.", 'start': 2259.207, 'duration': 4.762}, {'end': 2271.071, 'text': 'So that means not on a single machine, obviously, or maybe not even on a single rack, given the problems we saw previously.', 'start': 2264.029, 'duration': 7.042}, {'end': 2274.412, 'text': "And you'd like high read and write bandwidth to the data you've stored.", 'start': 2271.431, 'duration': 2.981}, {'end': 2282.863, 'text': 'to have the ability to run large scale computations reliably and without having to deal with individual machine failures.', 'start': 2276.753, 'duration': 6.11}, {'end': 2293.127, 'text': 'Let me briefly talk about the Google file system that was developed at Google in 2003.', 'start': 2286.224, 'duration': 6.903}, {'end': 2303.492, 'text': 'Essentially, it was a file system optimized for the kinds of workloads we had, which were very large files, so typically a whole record of files,', 'start': 2293.127, 'duration': 10.365}, {'end': 2305.453, 'text': "of documents we'd crawled.", 'start': 2303.492, 'duration': 1.961}, {'end': 2310.396, 'text': 'So the files were hundreds of megabytes, gigabytes in size, not tiny 5K files.', 'start': 2305.513, 'duration': 4.883}], 'summary': 'Reliability and availability from software, prefer more machines for computing oomph and high availability storage with high read/write bandwidth.', 'duration': 92.014, 'max_score': 2218.382, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2218382.jpg'}, {'end': 2361.254, 'src': 'embed', 'start': 2334.522, 'weight': 1, 'content': [{'end': 2338.764, 'text': 'And clients, when they actually wanted to read the data in the file, would talk directly to the appropriate chunk server.', 'start': 2334.522, 'duration': 4.242}, {'end': 2344.253, 'text': 'Clients talk to the master to figure out where the data is, but then they read directly from the trunk server.', 'start': 2340.504, 'duration': 3.749}, {'end': 2350.046, 'text': 'And the files are broken into chunks of roughly 64 megabytes in size.', 'start': 2345.496, 'duration': 4.55}, {'end': 2356.873, 'text': "And because we want to tolerate machine failures, we're going to replicate chunks across multiple machines.", 'start': 2351.451, 'duration': 5.422}, {'end': 2361.254, 'text': "So we're going to make multiple identical replicas of the same chunk, typically three.", 'start': 2356.893, 'duration': 4.361}], 'summary': 'Data files are broken into 64mb chunks and replicated three times for fault tolerance.', 'duration': 26.732, 'max_score': 2334.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2334522.jpg'}, {'end': 2545.525, 'src': 'embed', 'start': 2516.958, 'weight': 0, 'content': [{'end': 2522.746, 'text': 'The good news is that if you parallelize them, you can actually get pretty decent response times for fairly data-intensive tasks.', 'start': 2516.958, 'duration': 5.788}, {'end': 2524.508, 'text': 'Like I want to do something with all the web pages.', 'start': 2522.846, 'duration': 1.662}, {'end': 2531.218, 'text': "If I'm able to parallelize that across 1, 000 machines, then I can do it in three hours instead of three months.", 'start': 2525.029, 'duration': 6.189}, {'end': 2536.321, 'text': "The bad news is there's a lot of issues that make this difficult.", 'start': 2532.799, 'duration': 3.522}, {'end': 2541.063, 'text': 'You have to communicate between the different pieces of your paralyzed job.', 'start': 2537.401, 'duration': 3.662}, {'end': 2545.525, 'text': "You have to coordinate, figure out who's going to do what, recover from machine failure somehow.", 'start': 2541.083, 'duration': 4.442}], 'summary': 'Parallelizing tasks across 1,000 machines reduces time from 3 months to 3 hours, but faces communication and coordination challenges.', 'duration': 28.567, 'max_score': 2516.958, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2516958.jpg'}], 'start': 2218.382, 'title': 'Reliability and availability in software and google file system overview', 'summary': 'Emphasizes the importance of reliability and availability in software for large-scale operations, preferring more machines for increased computing power and the need for high availability and data persistence while running large-scale computations. it also provides an overview of the google file system (gfs) developed in 2003, highlighting its optimized design for handling very large files, use of chunk servers, replication of chunks, and parallelization of computation across thousands of machines for faster data processing.', 'chapters': [{'end': 2282.863, 'start': 2218.382, 'title': 'Reliability and availability in software', 'summary': 'Emphasizes the importance of reliability and availability in software for large-scale operations, preferring more machines for increased computing power and the need for high availability and data persistence while running large-scale computations.', 'duration': 64.481, 'highlights': ['The importance of reliability and availability in software for large-scale operations, even with more reliable hardware, at scale, hardware failures still occur.', 'Preferring more machines that are less reliable to get more computing power per dollar.', 'The need for high availability and data persistence, not on a single machine or rack, and high read and write bandwidth for stored data.', 'The ability to run large-scale computations reliably without dealing with individual machine failures.']}, {'end': 2555.91, 'start': 2286.224, 'title': 'Google file system overview', 'summary': 'Provides an overview of the google file system (gfs) developed in 2003, highlighting its optimized design for handling very large files, use of chunk servers, replication of chunks, and parallelization of computation across thousands of machines for faster data processing.', 'duration': 269.686, 'highlights': ['The Google File System (GFS) was optimized for handling very large files, typically in the hundreds of megabytes to gigabytes in size, and not tiny 5K files. The GFS was designed to handle very large files, optimized for workloads involving files of hundreds of megabytes to gigabytes in size, rather than smaller files.', 'The data in the file was spread across numerous machines in the data center, with a master managing the file system metadata and the actual data being managed by chunk servers. The data in the file was distributed across multiple machines in the data center, with a master managing file system metadata and chunk servers handling the actual data.', 'Files were broken into chunks of roughly 64 megabytes in size and replicated across multiple machines to tolerate machine failures, typically three replicas per chunk. Files were divided into chunks of approximately 64 megabytes each and replicated across multiple machines, typically with three identical replicas per chunk to ensure fault tolerance.', 'Parallelization of computation across 1,000 machines allowed for faster data processing, reducing the time for data-intensive tasks from months to hours. Parallelization of computation across 1,000 machines significantly accelerated data processing, reducing the time for data-intensive tasks from months to hours.', 'Challenges in parallelization included communication between different components, coordination, recovery from machine failure, and status reporting. Challenges in parallelization included communication, coordination, recovery from machine failure, and status reporting, which were essential for efficient parallel processing.']}], 'duration': 337.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2218382.jpg', 'highlights': ['Parallelization of computation across 1,000 machines significantly accelerated data processing.', 'Files were divided into chunks of approximately 64 megabytes each and replicated across multiple machines.', 'The Google File System (GFS) was optimized for handling very large files, typically in the hundreds of megabytes to gigabytes in size.', 'The importance of reliability and availability in software for large-scale operations, even with more reliable hardware, at scale, hardware failures still occur.', 'Preferring more machines that are less reliable to get more computing power per dollar.', 'The need for high availability and data persistence, not on a single machine or rack, and high read and write bandwidth for stored data.']}, {'end': 3244.878, 'segs': [{'end': 2598.782, 'src': 'embed', 'start': 2574.253, 'weight': 4, 'content': [{'end': 2586.581, 'text': "One of the things about an indexing system is it starts with raw page contents on disk and then it goes through a whole bunch of phases to kind of compute intermediate data structures that you're eventually going to bake into either the index serving system or the doc serving system.", 'start': 2574.253, 'duration': 12.328}, {'end': 2591.318, 'text': 'over time,', 'start': 2590.838, 'duration': 0.48}, {'end': 2598.782, 'text': 'you kind of accrete more and more of these phases to compute other kinds of derived information that you either know is useful in your ranking algorithm or,', 'start': 2591.318, 'duration': 7.464}], 'summary': 'Indexing system processes raw page contents to compute intermediate data structures for index serving and doc serving systems.', 'duration': 24.529, 'max_score': 2574.253, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2574253.jpg'}, {'end': 2643.859, 'src': 'embed', 'start': 2609.527, 'weight': 3, 'content': [{'end': 2613.93, 'text': 'where we hand parallelized it across a bunch of different chunks of input.', 'start': 2609.527, 'duration': 4.403}, {'end': 2618.374, 'text': 'we would have handwritten checkpointing code to basically deal with fault tolerance.', 'start': 2613.93, 'duration': 4.444}, {'end': 2626.641, 'text': 'So if a machine crashed, you would revert to the last checkpoint that that machine had saved and restart the computation from there and roll forward.', 'start': 2618.394, 'duration': 8.247}, {'end': 2643.859, 'text': "So eventually we squinted at all these different phases in our indexing system and said a lot of these look pretty similar in that they're extracting something from some input and transforming it in some way and then producing some output.", 'start': 2629.969, 'duration': 13.89}], 'summary': 'Parallelized input processing with fault tolerance for indexing system.', 'duration': 34.332, 'max_score': 2609.527, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2609527.jpg'}, {'end': 2704.78, 'src': 'embed', 'start': 2676.556, 'weight': 5, 'content': [{'end': 2681.779, 'text': 'and let the library deal with a lot of these issues and use that library for all kinds of different computations.', 'start': 2676.556, 'duration': 5.223}, {'end': 2691.625, 'text': "So the typical problem you're trying to solve in MapReduce is you want to read a lot of data, extract something you care about with a map function.", 'start': 2682.319, 'duration': 9.306}, {'end': 2694.806, 'text': 'The library internally will shuffle and sort it.', 'start': 2692.905, 'duration': 1.901}, {'end': 2700.47, 'text': 'And then you want to apply a reduce function that says how you want to combine data that you generated in the map phase.', 'start': 2695.487, 'duration': 4.983}, {'end': 2704.78, 'text': 'And so this outline is basically the MapReduce programming model.', 'start': 2701.858, 'duration': 2.922}], 'summary': 'Mapreduce programming model involves reading and processing large volumes of data using map and reduce functions.', 'duration': 28.224, 'max_score': 2676.556, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2676556.jpg'}, {'end': 3082.814, 'src': 'embed', 'start': 3051.302, 'weight': 0, 'content': [{'end': 3055.144, 'text': "So you're actually shuffling in parallel with running other map task computations.", 'start': 3051.302, 'duration': 3.842}, {'end': 3059.326, 'text': 'And so you end up with a lot of pipelining in this system.', 'start': 3056.685, 'duration': 2.641}, {'end': 3062.568, 'text': 'And you get better dynamic load balancing by having finer granularity tasks.', 'start': 3059.346, 'duration': 3.222}, {'end': 3067.439, 'text': "It's actually pretty fault tolerant.", 'start': 3065.838, 'duration': 1.601}, {'end': 3082.814, 'text': 'so you can actually handle a lot of the issues when a machine fails by just re-executing the work it was supposed to have done and do a little bookkeeping to keep track of who actually now has the current output for map task 7 or whatever.', 'start': 3067.439, 'duration': 15.375}], 'summary': 'Parallel shuffling enables better load balancing and fault tolerance in the system.', 'duration': 31.512, 'max_score': 3051.302, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3051302.jpg'}, {'end': 3244.878, 'src': 'embed', 'start': 3173.304, 'weight': 1, 'content': [{'end': 3176.247, 'text': 'Another important thing in our case is a locality optimization.', 'start': 3173.304, 'duration': 2.943}, {'end': 3184.315, 'text': "So the scheduling policy is such that we're going to ask GFS for locations of replicas of input file blocks.", 'start': 3176.307, 'duration': 8.008}, {'end': 3194.385, 'text': "And we're going to try to schedule map tasks so that they're local to that particular machine, or at least on the same rack.", 'start': 3185.096, 'duration': 9.289}, {'end': 3204.157, 'text': "So here's some stats about how MapReduce has been used over time within Google.", 'start': 3197.215, 'duration': 6.942}, {'end': 3206.378, 'text': "I won't really read it all.", 'start': 3205.378, 'duration': 1}, {'end': 3215.321, 'text': "We're roughly processing an exabyte a month now, running four million jobs a month.", 'start': 3206.498, 'duration': 8.823}, {'end': 3218.422, 'text': 'Kind of interesting.', 'start': 3217.922, 'duration': 0.5}, {'end': 3219.002, 'text': "I don't know.", 'start': 3218.722, 'duration': 0.28}, {'end': 3228.603, 'text': '946, 000 terabytes.', 'start': 3227.922, 'duration': 0.681}, {'end': 3229.504, 'text': "That's a petabyte.", 'start': 3228.623, 'duration': 0.881}, {'end': 3234.429, 'text': 'No An exabyte is nearly an exabyte.', 'start': 3229.584, 'duration': 4.845}, {'end': 3235.931, 'text': 'Nearly an exabyte.', 'start': 3235.15, 'duration': 0.781}, {'end': 3236.812, 'text': 'Exabyte Yes.', 'start': 3236.011, 'duration': 0.801}, {'end': 3240.656, 'text': 'Yes August of 2004 was three terabytes.', 'start': 3238.394, 'duration': 2.262}, {'end': 3244.878, 'text': "I'm not going to touch on this too much.", 'start': 3243.598, 'duration': 1.28}], 'summary': 'Google processes nearly an exabyte a month, running four million jobs.', 'duration': 71.574, 'max_score': 3173.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3173304.jpg'}], 'start': 2555.91, 'title': 'Data processing techniques', 'summary': 'Discusses rewriting an indexing system in 2003 for easier incorporation of new information through handwritten parallel computation and improved fault tolerance. it also introduces the mapreduce programming model, which simplifies data processing, achieves fault tolerance, dynamic load balancing, and handles large-scale data processing, processing approximately an exabyte a month and running four million jobs a month at google.', 'chapters': [{'end': 2626.641, 'start': 2555.91, 'title': 'Rewriting indexing system for efficiency', 'summary': 'Discusses the process of rewriting an indexing system in 2003 to make it easier to incorporate new information, involving handwritten parallel computation and the need for improved fault tolerance.', 'duration': 70.731, 'highlights': ['The process of rewriting the indexing system in 2003 to incorporate new information involved handwritten parallel computation and the need for improved fault tolerance, including handwritten checkpointing code to deal with machine crashes and reverting to the last checkpoint (relevant)', 'The indexing system starts with raw page contents on disk, goes through phases to compute intermediate data structures, and accretes more phases over time to compute various derived information for ranking algorithms or experimental purposes (relevant)']}, {'end': 3244.878, 'start': 2629.969, 'title': 'Introduction to mapreduce', 'summary': 'Introduces the mapreduce programming model, which simplifies data processing by expressing computations in a map and reduce functions, hiding messy details, and leveraging a library, with a real example from a production maps system, achieving fault tolerance, dynamic load balancing, and handling large-scale data processing, processing approximately an exabyte a month and running four million jobs a month at google.', 'duration': 614.909, 'highlights': ['The MapReduce programming model simplifies data processing by expressing computations in a map and reduce functions, hiding messy details in a library, and achieving fault tolerance, dynamic load balancing, and handling large-scale data processing. Simplified data processing through map and reduce functions, hid messy details in a library, achieved fault tolerance, dynamic load balancing, and handled large-scale data processing.', 'A real example from a production maps system is used to illustrate the process of generating map tiles from a whole bunch of lists of geographic features, achieving fault tolerance in large computations, and handling approximately an exabyte of data processing a month at Google. Illustrated the process of generating map tiles, achieving fault tolerance, and handling approximately an exabyte of data processing a month at Google.', 'The MapReduce system handles fault tolerance efficiently, allowing for fast recovery from machine failures by distributing tasks to other machines, and it also enables better dynamic load balancing by having finer granularity tasks. Efficiently handled fault tolerance by distributing tasks to other machines for fast recovery, enabled better dynamic load balancing with finer granularity tasks.', 'The use of backup tasks and locality optimization are automatically handled by the MapReduce framework, improving completion time tremendously and optimizing task scheduling for better performance. Automatically handled backup tasks and locality optimization, improving completion time and task scheduling for better performance.', 'Google processes approximately an exabyte of data a month and runs four million jobs a month using MapReduce, demonstrating its large-scale data processing capabilities. Processed an exabyte of data a month, ran four million jobs a month, demonstrating large-scale data processing capabilities.']}], 'duration': 688.968, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI2555910.jpg', 'highlights': ['The MapReduce system efficiently handles fault tolerance by distributing tasks to other machines for fast recovery and enables better dynamic load balancing with finer granularity tasks.', 'The use of backup tasks and locality optimization are automatically handled by the MapReduce framework, improving completion time tremendously and optimizing task scheduling for better performance.', 'Google processes approximately an exabyte of data a month and runs four million jobs a month using MapReduce, demonstrating its large-scale data processing capabilities.', 'The process of rewriting the indexing system in 2003 to incorporate new information involved handwritten parallel computation and the need for improved fault tolerance, including handwritten checkpointing code to deal with machine crashes and reverting to the last checkpoint.', 'The indexing system starts with raw page contents on disk, goes through phases to compute intermediate data structures, and accretes more phases over time to compute various derived information for ranking algorithms or experimental purposes.', 'The MapReduce programming model simplifies data processing by expressing computations in map and reduce functions, hiding messy details in a library, and achieving fault tolerance, dynamic load balancing, and handling large-scale data processing.']}, {'end': 3630.873, 'segs': [{'end': 3287.133, 'src': 'embed', 'start': 3245.879, 'weight': 0, 'content': [{'end': 3250.08, 'text': "I'll just briefly mention a current system I'm working on with seven or eight other people.", 'start': 3245.879, 'duration': 4.201}, {'end': 3257.882, 'text': "Basically, it's a system called Spanner that is designed to be a piece of infrastructure that runs in a single instance across multiple data centers.", 'start': 3250.9, 'duration': 6.982}, {'end': 3264.925, 'text': 'So a lot of the systems we developed at Google run one instance in one data center and then another instance in another data center.', 'start': 3258.203, 'duration': 6.722}, {'end': 3271.527, 'text': "And this is kind of the first system we've really tried to build that spans many data centers at a large scale.", 'start': 3266.765, 'duration': 4.762}, {'end': 3281.61, 'text': 'So it has a single global namespace for data, so that if you decide to move or change the replication of some data or move a copy of some data,', 'start': 3272.927, 'duration': 8.683}, {'end': 3287.133, 'text': "the name of that data doesn't change, which is a pretty important operational property within Google.", 'start': 3281.61, 'duration': 5.523}], 'summary': 'Working on system called spanner, spans many data centers, single global namespace for data.', 'duration': 41.254, 'max_score': 3245.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3245879.jpg'}, {'end': 3357.69, 'src': 'embed', 'start': 3309.423, 'weight': 2, 'content': [{'end': 3314.666, 'text': "And we're working on supporting a mix of strong and weakly consistent operations across data centers.", 'start': 3309.423, 'duration': 5.243}, {'end': 3324.61, 'text': "And the hope is that the system is much more automated, especially when it's moving data across different data centers, than right now,", 'start': 3317.347, 'duration': 7.263}, {'end': 3327.471, 'text': 'which is kind of a fairly manual, labor-intensive process.', 'start': 3324.61, 'duration': 2.861}, {'end': 3336.194, 'text': 'So this is kind of a very broad, high-level view of the zone design of Spanner.', 'start': 3329.592, 'duration': 6.602}, {'end': 3339.195, 'text': "So you're going to have a whole bunch of zones around the world in different data centers.", 'start': 3336.234, 'duration': 2.961}, {'end': 3349.129, 'text': 'And these zones are going to store copies of data and might also have replicas of that data in different other zones.', 'start': 3341.332, 'duration': 7.797}, {'end': 3357.69, 'text': "And we'd like the zones to be semi-autonomous, so they can still continue functioning and doing load balancing within the zone on their own,", 'start': 3350.789, 'duration': 6.901}], 'summary': 'Designing spanner to support consistent operations across data centers, aiming for more automation and less manual labor.', 'duration': 48.267, 'max_score': 3309.423, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3309423.jpg'}, {'end': 3403.916, 'src': 'embed', 'start': 3378.394, 'weight': 4, 'content': [{'end': 3384.38, 'text': "We'd rather they say I want three copies of the data, two in Europe and one in the US,", 'start': 3378.394, 'duration': 5.986}, {'end': 3387.065, 'text': 'rather than saying I want it in exactly these three data centers.', 'start': 3384.38, 'duration': 2.685}, {'end': 3392.142, 'text': 'All right.', 'start': 3391.561, 'duration': 0.581}, {'end': 3403.916, 'text': "the final portion of the talk is a set of design experiences and patterns that I think you'll see derived from some of these systems I've described.", 'start': 3392.142, 'duration': 11.774}], 'summary': 'Prefer three data copies: 2 in europe, 1 in us', 'duration': 25.522, 'max_score': 3378.394, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3378394.jpg'}, {'end': 3471.658, 'src': 'embed', 'start': 3449.193, 'weight': 5, 'content': [{'end': 3458.384, 'text': 'And they can roll out new versions at whatever pace makes sense for them and are largely decoupled from the other people working on other distributed services.', 'start': 3449.193, 'duration': 9.191}, {'end': 3461.128, 'text': 'So this is pretty important.', 'start': 3459.486, 'duration': 1.642}, {'end': 3466.995, 'text': 'Small teams can work independently of other ones by carefully defining these interfaces and services.', 'start': 3461.528, 'duration': 5.467}, {'end': 3471.658, 'text': 'And that also makes it easier for us to have a lot of engineering offices around the world.', 'start': 3468.075, 'duration': 3.583}], 'summary': 'Decoupled services allow for independent work, enabling global engineering offices.', 'duration': 22.465, 'max_score': 3449.193, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3449193.jpg'}, {'end': 3549.058, 'src': 'embed', 'start': 3522.551, 'weight': 6, 'content': [{'end': 3529.573, 'text': 'Is it easy to understand? Is the interface clean? But it also has some quantitative aspects, like how is it going to perform? and so on.', 'start': 3522.551, 'duration': 7.022}, {'end': 3536.455, 'text': "And so a really important skill when you're designing systems is being able to estimate, with the back of the envelope kind of calculation,", 'start': 3529.793, 'duration': 6.662}, {'end': 3542.337, 'text': 'what the performance of a system or several alternative designs is going to be, without actually having to build it.', 'start': 3536.455, 'duration': 5.882}, {'end': 3549.058, 'text': 'If you have to build four versions and measure them in order to figure out that, wow, those three were really bad ideas,', 'start': 3543.377, 'duration': 5.681}], 'summary': 'Design systems require estimating performance without building multiple versions.', 'duration': 26.507, 'max_score': 3522.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3522551.jpg'}], 'start': 3245.879, 'title': "Google's spanner system and distributed systems design", 'summary': "Discusses google's spanner system, a global infrastructure with a single namespace for data, simplifying data sharing and operations. it also covers the high-level zone design of spanner, emphasizing data movement automation and system performance estimation.", 'chapters': [{'end': 3306.522, 'start': 3245.879, 'title': "Google's spanner system", 'summary': "Discusses google's spanner system, an infrastructure designed to operate across multiple data centers, featuring a single global namespace for data, which allows for seamless data movement and replication, eliminating the need to update data names, thus simplifying data sharing and operations within google.", 'duration': 60.643, 'highlights': ['The Spanner system is designed to run in a single instance across multiple data centers, marking a significant departure from previous systems at Google.', "It features a single global namespace for data, enabling seamless data movement and replication without changing the data's name, streamlining data sharing and operational processes within Google.", "Previously, changing data replication or moving data copies would result in a change of the data's name, complicating sharing across different groups, highlighting the importance of the new system's operational properties within Google."]}, {'end': 3630.873, 'start': 3309.423, 'title': 'Designing spanner zones and distributed systems', 'summary': 'Discusses the high-level zone design of spanner, emphasizing the automation of data movement across different data centers, the autonomous nature of zones, and the importance of breaking down systems into separate subsystems and estimating system performance.', 'duration': 321.45, 'highlights': ["The system aims to automate data movement across different data centers and make it less manual and labor-intensive. The hope is that the system is much more automated, especially when it's moving data across different data centers, than right now, which is kind of a fairly manual, labor-intensive process.", "Zones in different data centers store copies of data and may have replicas in other zones, aiming for semi-autonomous functionality and consistent operation even when partitioned. You're going to have a whole bunch of zones around the world in different data centers. And these zones are going to store copies of data and might also have replicas of that data in different other zones. And we'd like the zones to be semi-autonomous, so they can still continue functioning and doing load balancing within the zone on their own, even if they're partitioned away from the rest of the system.", "Users are encouraged to specify high-level desires rather than specific details, such as specifying the number of data copies in different regions rather than specific data center locations. And users are going to specify high-level desires rather than very specific things. We'd rather they say I want three copies of the data, two in Europe and one in the US, rather than saying I want it in exactly these three data centers.", 'Breaking down large software systems into separate distributed services enables independent work and easy interface specification, aiding in global engineering office expansion and development. Small teams can work independently of other ones by carefully defining these interfaces and services. And that also makes it easier for us to have a lot of engineering offices around the world.', "Importance of estimating system performance and making qualitative and quantitative evaluations when designing systems. And so a really important skill when you're designing systems is being able to estimate, with the back of the envelope kind of calculation, what the performance of a system or several alternative designs is going to be, without actually having to build it."]}], 'duration': 384.994, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3245879.jpg', 'highlights': ['The Spanner system is designed to run in a single instance across multiple data centers, marking a significant departure from previous systems at Google.', "It features a single global namespace for data, enabling seamless data movement and replication without changing the data's name, streamlining data sharing and operational processes within Google.", 'The system aims to automate data movement across different data centers and make it less manual and labor-intensive.', 'Zones in different data centers store copies of data and may have replicas in other zones, aiming for semi-autonomous functionality and consistent operation even when partitioned.', 'Users are encouraged to specify high-level desires rather than specific details, such as specifying the number of data copies in different regions rather than specific data center locations.', 'Breaking down large software systems into separate distributed services enables independent work and easy interface specification, aiding in global engineering office expansion and development.', 'Importance of estimating system performance and making qualitative and quantitative evaluations when designing systems.']}, {'end': 4308.117, 'segs': [{'end': 3659.708, 'src': 'embed', 'start': 3631.273, 'weight': 0, 'content': [{'end': 3633.433, 'text': "And let's say I'm tasked with this thing.", 'start': 3631.273, 'duration': 2.16}, {'end': 3635.834, 'text': 'I have to generate 30 thumbnails for an image search.', 'start': 3633.473, 'duration': 2.361}, {'end': 3641.736, 'text': "And I have one design where I'm going to basically, for each of the 30 images, I'm going to do a disk seek.", 'start': 3636.594, 'duration': 5.142}, {'end': 3645.777, 'text': "And then I'm going to read the quarter megabyte image.", 'start': 3641.776, 'duration': 4.001}, {'end': 3648.838, 'text': "And then I'm going to go on to the next image.", 'start': 3646.777, 'duration': 2.061}, {'end': 3651.358, 'text': "So that's one design.", 'start': 3650.558, 'duration': 0.8}, {'end': 3653.179, 'text': 'It would take me roughly half a second.', 'start': 3651.418, 'duration': 1.761}, {'end': 3659.708, 'text': 'Obviously, design two, not obviously too difficult to figure out.', 'start': 3654.345, 'duration': 5.363}], 'summary': 'Tasked with generating 30 thumbnails for an image search, design one would take roughly half a second.', 'duration': 28.435, 'max_score': 3631.273, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3631273.jpg'}, {'end': 3717.309, 'src': 'embed', 'start': 3675.575, 'weight': 1, 'content': [{'end': 3677.516, 'text': "But it'll be significantly better than the first design.", 'start': 3675.575, 'duration': 1.941}, {'end': 3685.305, 'text': 'And the back of the envelope calculations allow you to work out lots of different variations.', 'start': 3681.084, 'duration': 4.221}, {'end': 3689.906, 'text': "If you're going to cache in the system, does it make sense to cache the thumbnails of single images??", 'start': 3685.545, 'duration': 4.361}, {'end': 3693.967, 'text': 'Should you cache a whole set of thumbnails in one cache entry??', 'start': 3690.146, 'duration': 3.821}, {'end': 3696.488, 'text': 'Does it make sense to pre-compute thumbnails?', 'start': 3694.868, 'duration': 1.62}, {'end': 3702.57, 'text': "And these kinds of calculations are the kinds of things you should be doing over and over in your head when you're designing a system.", 'start': 3697.248, 'duration': 5.322}, {'end': 3710.361, 'text': "The other thing that's pretty important is to know the back of the envelope numbers for things you're building on top of.", 'start': 3704.215, 'duration': 6.146}, {'end': 3717.309, 'text': "If you don't know roughly how long it takes to do a write into your cache system,", 'start': 3710.942, 'duration': 6.367}], 'summary': 'Design improvements and caching strategy for system optimization.', 'duration': 41.734, 'max_score': 3675.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3675575.jpg'}, {'end': 3774.308, 'src': 'embed', 'start': 3744.864, 'weight': 3, 'content': [{'end': 3750.586, 'text': "And it's really important to listen to the commonalities and figure out what it is they all really want.", 'start': 3744.864, 'duration': 5.722}, {'end': 3751.486, 'text': "It's a good thing.", 'start': 3750.846, 'duration': 0.64}, {'end': 3757.988, 'text': 'But if they tell you they want eight different things, usually six of them might be in common.', 'start': 3752.806, 'duration': 5.182}, {'end': 3759.468, 'text': 'And you should pick those six.', 'start': 3758.408, 'duration': 1.06}, {'end': 3762.189, 'text': "And you should do them, because that's clearly going to help a lot of people.", 'start': 3759.728, 'duration': 2.461}, {'end': 3768.391, 'text': 'If you really stretch yourself, you can usually handle an extra one that would help a lot of people too.', 'start': 3763.369, 'duration': 5.022}, {'end': 3774.308, 'text': "But if you try to do all eight, it's really going to probably result in a worse system for everyone,", 'start': 3770.025, 'duration': 4.283}], 'summary': 'Listening to commonalities is key, prioritize 6 needs to help many, stretching for 1 more is feasible, but not all 8.', 'duration': 29.444, 'max_score': 3744.864, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3744864.jpg'}, {'end': 3832.748, 'src': 'embed', 'start': 3797.701, 'weight': 6, 'content': [{'end': 3804.547, 'text': "And then sometimes translate that into, if I did this feature, then you wouldn't need this other thing you're asking for.", 'start': 3797.701, 'duration': 6.846}, {'end': 3807.749, 'text': "And they're like, sometimes they say, oh yeah, that's true.", 'start': 3805.007, 'duration': 2.742}, {'end': 3815.856, 'text': "Another thing I found is that you don't want to build infrastructure just because it's fun to build infrastructure.", 'start': 3809.791, 'duration': 6.065}, {'end': 3817.337, 'text': 'You want to build it to address real needs.', 'start': 3815.876, 'duration': 1.461}, {'end': 3832.748, 'text': "And another trap is kind of when you're designing something to imagine these hypothetical uses like what if we had a client who wanted to do this and you design a lot of complexity and because you imagine that would be useful,", 'start': 3818.238, 'duration': 14.51}], 'summary': 'Design features based on real needs, not hypothetical uses.', 'duration': 35.047, 'max_score': 3797.701, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3797701.jpg'}, {'end': 3907.665, 'src': 'embed', 'start': 3878.689, 'weight': 4, 'content': [{'end': 3880.75, 'text': 'Another thing is design for growth.', 'start': 3878.689, 'duration': 2.061}, {'end': 3886.932, 'text': 'So you try to anticipate which parameters of your system are actually going to grow and how much over time.', 'start': 3880.97, 'duration': 5.962}, {'end': 3888.672, 'text': "Obviously, you can't do this perfectly.", 'start': 3887.132, 'duration': 1.54}, {'end': 3892.913, 'text': "If you could, you'd predict the future and you'd all be better off.", 'start': 3888.732, 'duration': 4.181}, {'end': 3900.316, 'text': "But I've often found that it's not really useful to design a system so that it can scale infinitely.", 'start': 3894.294, 'duration': 6.022}, {'end': 3907.665, 'text': 'If you think about the in-memory example, our original disk-based index, we made that evolve pretty well.', 'start': 3901.255, 'duration': 6.41}], 'summary': 'Design for growth by anticipating system parameters and evolution, not for infinite scalability.', 'duration': 28.976, 'max_score': 3878.689, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3878689.jpg'}, {'end': 3973.763, 'src': 'embed', 'start': 3947.72, 'weight': 5, 'content': [{'end': 3954.246, 'text': 'you can actually go quite far in a lot of distributed systems with a centralized component that has some amount of state.', 'start': 3947.72, 'duration': 6.526}, {'end': 3955.407, 'text': 'that makes a lot of things easier.', 'start': 3954.246, 'duration': 1.161}, {'end': 3958.69, 'text': 'So good examples of this are GFS.', 'start': 3956.548, 'duration': 2.142}, {'end': 3961.513, 'text': 'where we had the centralized metadata master.', 'start': 3959.751, 'duration': 1.762}, {'end': 3967.658, 'text': "Bigtable, which I didn't talk about, but has a centralized master that coordinates load balancing across a bunch of machines.", 'start': 3961.833, 'duration': 5.825}, {'end': 3973.763, 'text': 'MapReduce has the master for coordinating handing out of map tasks and various other examples.', 'start': 3967.718, 'duration': 6.045}], 'summary': 'Distributed systems can benefit from centralized components with state, as seen in gfs, bigtable, and mapreduce.', 'duration': 26.043, 'max_score': 3947.72, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3947720.jpg'}], 'start': 3631.273, 'title': 'System design efficiency and performance', 'summary': 'Discusses the significance of back-of-the-envelope calculations, time efficiency, and performance gains in system design, with a focus on parallel processing and caching, showcasing the impact on generating 30 thumbnails for an image search. it also emphasizes the importance of addressing user needs, avoiding unnecessary infrastructure, designing for growth, and implementing efficient distributed systems to manage workload and minimize latency.', 'chapters': [{'end': 3743.914, 'start': 3631.273, 'title': 'System design efficiency and performance', 'summary': 'Discusses the importance of back-of-the-envelope calculations in system design, considering the time efficiency and performance gains through parallel processing and caching, highlighting the impact on generating 30 thumbnails for an image search.', 'duration': 112.641, 'highlights': ['Parallel processing of disk reads can significantly improve performance, reducing the time to around 20 milliseconds from half a second for generating 30 thumbnails. Parallel processing with enough disks can reduce the time required to generate 30 thumbnails from half a second to around 20 milliseconds.', 'Considering back-of-the-envelope calculations for caching strategies and thumbnail pre-computation is crucial in system design for efficient performance. Back-of-the-envelope calculations for caching strategies and thumbnail pre-computation are crucial for efficient performance in system design.', 'Understanding the time it takes for write operations in a disk-based cache system is essential for making informed decisions on caching functionality. Understanding the time required for write operations in a disk-based cache system is crucial for making informed decisions on caching functionality.']}, {'end': 4308.117, 'start': 3744.864, 'title': 'Designing efficient systems', 'summary': 'Emphasizes the importance of focusing on commonalities to address user needs, avoiding building unnecessary infrastructure, designing for growth, and implementing efficient distributed systems to manage workload and minimize latency.', 'duration': 563.253, 'highlights': ['The importance of focusing on commonalities and addressing user needs by prioritizing features that are common among user requests, which can help a lot of people.', "The significance of avoiding unnecessary infrastructure and complex systems, as attempting to fulfill all user requests can compromise the system's effectiveness and potentially harm the clients.", 'The value of designing systems for growth by anticipating parameters that may scale over time, as well as the need to adapt to unexpected peak loads and resilience in the face of overload.', 'The efficiency of implementing distributed systems with centralized components to simplify system management, scale to a large number of workers, and provide a centralized place for system status information and recovery.']}], 'duration': 676.844, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI3631273.jpg', 'highlights': ['Parallel processing with enough disks can reduce the time required to generate 30 thumbnails from half a second to around 20 milliseconds.', 'Back-of-the-envelope calculations for caching strategies and thumbnail pre-computation are crucial for efficient performance in system design.', 'Understanding the time required for write operations in a disk-based cache system is crucial for making informed decisions on caching functionality.', 'The importance of focusing on commonalities and addressing user needs by prioritizing features that are common among user requests, which can help a lot of people.', 'The value of designing systems for growth by anticipating parameters that may scale over time, as well as the need to adapt to unexpected peak loads and resilience in the face of overload.', 'The efficiency of implementing distributed systems with centralized components to simplify system management, scale to a large number of workers, and provide a centralized place for system status information and recovery.', "The significance of avoiding unnecessary infrastructure and complex systems, as attempting to fulfill all user requests can compromise the system's effectiveness and potentially harm the clients."]}, {'end': 4963.09, 'segs': [{'end': 4370.423, 'src': 'heatmap', 'start': 4317.607, 'weight': 0.721, 'content': [{'end': 4330.118, 'text': "So when things are roughly balanced, I'm going to skip this slide in the interest of time and go ahead to final thoughts.", 'start': 4317.607, 'duration': 12.511}, {'end': 4343.185, 'text': "So I think one of the really exciting things about the period that we're in now is that we have really large collections of computational power in single locations and have very interesting data sets available.", 'start': 4330.238, 'duration': 12.947}, {'end': 4347.306, 'text': "Large fractions of the world's knowledge are now online.", 'start': 4344.565, 'duration': 2.741}, {'end': 4351.087, 'text': "There's all kinds of interesting new data sets available.", 'start': 4348.327, 'duration': 2.76}, {'end': 4359.59, 'text': "And there's a proliferation of more powerful client devices that I think can interact with these data center-based services in interesting ways.", 'start': 4351.448, 'duration': 8.142}, {'end': 4365.332, 'text': 'So that brings about lots of interesting opportunities for new kinds of research.', 'start': 4360.75, 'duration': 4.582}, {'end': 4367.555, 'text': 'Thank you.', 'start': 4367.215, 'duration': 0.34}, {'end': 4370.423, 'text': 'I put together a list of papers that cover some of the material there.', 'start': 4367.976, 'duration': 2.447}], 'summary': 'Exciting period- large computational power, interesting data sets, online knowledge, and proliferation of powerful client devices leading to new research opportunities.', 'duration': 52.816, 'max_score': 4317.607, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4317607.jpg'}, {'end': 4569.515, 'src': 'embed', 'start': 4546.598, 'weight': 2, 'content': [{'end': 4554.484, 'text': "caused a fair amount of complexity, because there's lots of different tools that each solve one aspect of the larger problem.", 'start': 4546.598, 'duration': 7.886}, {'end': 4558.047, 'text': 'And it means more complexity for people trying to do those deployments.', 'start': 4555.225, 'duration': 2.822}, {'end': 4562.75, 'text': 'So I think simplification of complexity in this area is a big one.', 'start': 4558.067, 'duration': 4.683}, {'end': 4565.813, 'text': 'Yeah, in the back.', 'start': 4565.372, 'duration': 0.441}, {'end': 4569.515, 'text': 'So the volume of data on the web is growing.', 'start': 4566.173, 'duration': 3.342}], 'summary': 'Addressing complexity in deployment tools is crucial due to the growing volume of web data.', 'duration': 22.917, 'max_score': 4546.598, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4546598.jpg'}, {'end': 4638.329, 'src': 'embed', 'start': 4613.651, 'weight': 0, 'content': [{'end': 4620.336, 'text': "if you include video, though, then there's a proliferation of high-quality video devices,", 'start': 4613.651, 'duration': 6.685}, {'end': 4626.3, 'text': 'which have a tremendous propensity to generate very high bandwidth requirements and very large amounts of data.', 'start': 4620.336, 'duration': 5.964}, {'end': 4632.685, 'text': "So I think that'll continue to be a big issue for the foreseeable future, because that, I think,", 'start': 4626.501, 'duration': 6.184}, {'end': 4638.329, 'text': 'will generate a lot more data than the devices are really ready to deal with large amounts of.', 'start': 4632.685, 'duration': 5.644}], 'summary': 'High-quality video devices generate high bandwidth and large amounts of data, posing a big issue for the foreseeable future.', 'duration': 24.678, 'max_score': 4613.651, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4613651.jpg'}, {'end': 4688.843, 'src': 'embed', 'start': 4661.873, 'weight': 1, 'content': [{'end': 4668.3, 'text': "And there's a lot of both software and hardware work that can go on there trying to generate idleness,", 'start': 4661.873, 'duration': 6.427}, {'end': 4676.169, 'text': "trying to make systems more energy proportional, so that if you're using half of a machine's CPU, it uses half the power.", 'start': 4668.3, 'duration': 7.869}, {'end': 4679.172, 'text': "Right now, it's more like 70% or 80%.", 'start': 4676.229, 'duration': 2.943}, {'end': 4681.475, 'text': 'So I think a bunch of factors like that can help.', 'start': 4679.172, 'duration': 2.303}, {'end': 4688.843, 'text': 'Looking at lower power CPUs is another interesting area.', 'start': 4682.876, 'duration': 5.967}], 'summary': 'Efforts to improve system idleness and energy proportionality can reduce cpu power usage by 20-30%.', 'duration': 26.97, 'max_score': 4661.873, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4661873.jpg'}, {'end': 4845.229, 'src': 'embed', 'start': 4800.374, 'weight': 3, 'content': [{'end': 4809.301, 'text': "What? You do write out and then read Sometimes you might write out to a cheaper storage system, for example, if it's intermediate data.", 'start': 4800.374, 'duration': 8.927}, {'end': 4812.083, 'text': 'And we know we could restart the whole computation from scratch if we want.', 'start': 4809.381, 'duration': 2.702}, {'end': 4819.089, 'text': 'Yeah Can you say something about Google Instant and what the impact of infrastructure is? Sure.', 'start': 4812.103, 'duration': 6.986}, {'end': 4825.134, 'text': 'So Google Instant is basically a system that is in the background,', 'start': 4819.789, 'duration': 5.345}, {'end': 4829.517, 'text': "predicting what query you're actually trying to issue when you type a few characters of it.", 'start': 4825.134, 'duration': 4.383}, {'end': 4831.939, 'text': "And then we'll actually prefetch the results for that.", 'start': 4829.977, 'duration': 1.962}, {'end': 4839.205, 'text': "So from a web search standpoint, it's essentially just getting certain kinds of requests.", 'start': 4833.3, 'duration': 5.905}, {'end': 4845.229, 'text': 'One of the things we do is when I mentioned making your system resilient to overload.', 'start': 4839.325, 'duration': 5.904}], 'summary': 'Google instant predicts user queries, prefetches results, and makes system resilient to overload.', 'duration': 44.855, 'max_score': 4800.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4800374.jpg'}, {'end': 4921.769, 'src': 'embed', 'start': 4893.105, 'weight': 4, 'content': [{'end': 4897.011, 'text': 'So the question is what is our experience with using distributed transactions?', 'start': 4893.105, 'duration': 3.906}, {'end': 4901.137, 'text': "So we don't have a huge amount of experience with that.", 'start': 4898.996, 'duration': 2.141}, {'end': 4907.101, 'text': "A lot of the infrastructure we've built in the past has kind of avoided implementing distributed transactions.", 'start': 4901.537, 'duration': 5.564}, {'end': 4912.564, 'text': 'For example, Bigtable has single row transactions, but not distributed transactions.', 'start': 4908.401, 'duration': 4.163}, {'end': 4915.825, 'text': 'In retrospect, I think that was a mistake.', 'start': 4914.304, 'duration': 1.521}, {'end': 4921.769, 'text': 'We probably should have added them in, because what ended up happening is a lot of people did want distributed transactions.', 'start': 4916.926, 'duration': 4.843}], 'summary': 'Limited experience with distributed transactions, potential mistake in not implementing them for bigtable.', 'duration': 28.664, 'max_score': 4893.105, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4893105.jpg'}], 'start': 4309.558, 'title': 'Data center management challenges', 'summary': 'Discusses challenges in data center management, including overload handling, resource allocation, networking coordination, infrastructure complexity, large-scale computational power, and data availability, with a focus on future considerations for growing data volumes and electricity usage.', 'chapters': [{'end': 4799.534, 'start': 4309.558, 'title': 'Data center management and infrastructure challenges', 'summary': 'Discusses the challenges and opportunities related to data center management, including overload handling, resource allocation, networking coordination, and infrastructure complexity, with a focus on large-scale computational power and data availability, as well as future considerations for handling growing data volumes and electricity usage.', 'duration': 489.976, 'highlights': ['The growth of the web, including video content, presents challenges for data storage and bandwidth requirements, with a proliferation of high-quality video devices generating large amounts of data. The textual web is not growing as fast as device capabilities, making it possible to store a large fraction of the web in a cabinet, but the proliferation of high-quality video devices generates high bandwidth requirements and large data volumes.', 'Challenges related to infrastructure complexity and cross-data center service deployment highlight the need for simplification and coordination, as multiple tools solve different aspects of the larger problem, causing increased complexity for deployments. Infrastructure complexity and cross-data center service deployment require simplification and coordination, as multiple tools solving different aspects lead to increased complexity for deployments.', 'Efforts to address power usage within data centers include software and hardware work to generate idleness, achieve energy proportionality, and explore lower power CPUs, aiming to manage electricity bills effectively. Efforts to address power usage within data centers involve software and hardware work to achieve energy proportionality and explore lower power CPUs, aiming to effectively manage electricity bills.']}, {'end': 4963.09, 'start': 4800.374, 'title': 'Google infrastructure insights', 'summary': "Discusses google's infrastructure, including the impact of google instant, system resilience to overload, and the use of distributed transactions within a single data center.", 'duration': 162.716, 'highlights': ['Google Instant predicts user queries and prefetches results, impacting web search requests and system resilience to overload by dropping predictive prefetch requests when overloaded.', 'The use of distributed transactions within a single data center was limited in the past, and the lack of built-in distributed transactions in infrastructure like Bigtable led to users hand-rolling their own protocols, highlighting the potential benefits of having distributed transactions available to everyone.', 'Cheaper storage systems are used for intermediate data, and the need for more machines is emphasized in handling predictive prefetch requests, indicating a resource issue in dealing with Google Instant.']}], 'duration': 653.532, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/modXC5IWTJI/pics/modXC5IWTJI4309558.jpg', 'highlights': ['The growth of the web, including video content, presents challenges for data storage and bandwidth requirements, with a proliferation of high-quality video devices generating large amounts of data.', 'Efforts to address power usage within data centers include software and hardware work to generate idleness, achieve energy proportionality, and explore lower power CPUs, aiming to manage electricity bills effectively.', 'Challenges related to infrastructure complexity and cross-data center service deployment highlight the need for simplification and coordination, as multiple tools solve different aspects of the larger problem, causing increased complexity for deployments.', 'Google Instant predicts user queries and prefetches results, impacting web search requests and system resilience to overload by dropping predictive prefetch requests when overloaded.', 'The use of distributed transactions within a single data center was limited in the past, and the lack of built-in distributed transactions in infrastructure like Bigtable led to users hand-rolling their own protocols, highlighting the potential benefits of having distributed transactions available to everyone.', 'Cheaper storage systems are used for intermediate data, and the need for more machines is emphasized in handling predictive prefetch requests, indicating a resource issue in dealing with Google Instant.']}], 'highlights': ['Google has achieved a 1,000x improvement in computational oomph by using more and faster machines since 1999.', 'The scale of documents indexed by Google has increased by about a factor of 1,000 from 1999 to today.', 'The traffic and queries handled by Google have also grown by a factor of 1,000 over the years.', 'Google has experienced about a 5x improvement in response time for users, measured at the server side, over the years.', "The update latency of Google's index has substantially improved, with portions of the index now being updated within a matter of seconds from calling the page.", 'Google has rolled out about seven significant revisions to its search system, resulting in a 1,000x improvement in computational oomph and a 5x improvement in response time over the last 11 years.', 'Google processes approximately an exabyte of data a month and runs four million jobs a month using MapReduce, demonstrating its large-scale data processing capabilities.', 'The Spanner system is designed to run in a single instance across multiple data centers, marking a significant departure from previous systems at Google.', 'The system aims to automate data movement across different data centers and make it less manual and labor-intensive.', 'Parallel processing with enough disks can reduce the time required to generate 30 thumbnails from half a second to around 20 milliseconds.', 'The value of designing systems for growth by anticipating parameters that may scale over time, as well as the need to adapt to unexpected peak loads and resilience in the face of overload.', 'Efforts to address power usage within data centers include software and hardware work to generate idleness, achieve energy proportionality, and explore lower power CPUs, aiming to manage electricity bills effectively.']}