title
How We've Scaled Dropbox

description
(Feburary 22, 2012) Kevin Modzelewski talks about Dropbox and its History. He describes the technological issues faced by Dropbox and the actions they have to take in order to continuously improve it. Stanford University: http://www.stanford.edu/ Stanford School of Engineering: http://soe.stanford.edu/ Stanford Computer Systems Colloquium: http://www.stanford.edu/class/ee380/ Stanford University Channel on YouTube: http://www.youtube.com/stanford

detail
{'title': "How We've Scaled Dropbox", 'heatmap': [{'end': 1188.209, 'start': 1107.92, 'weight': 1}], 'summary': "Explores dropbox's journey in scaling and building scalable systems, addressing technical and backend challenges, architectural changes, client-server connection, storage infrastructure, mysql performance, schema optimization, and the evolution of server file journal and data service, offering practical insights for startup founders and engineers.", 'chapters': [{'end': 225.829, 'segs': [{'end': 98.786, 'src': 'embed', 'start': 69.097, 'weight': 0, 'content': [{'end': 74.119, 'text': "My name's Kevin Moduleski, and I'm the server team lead at Dropbox.", 'start': 69.097, 'duration': 5.022}, {'end': 80.141, 'text': "Server team is a little bit of a historical name, as I'll talk about a little bit later.", 'start': 75, 'duration': 5.141}, {'end': 86.924, 'text': "But we're responsible for the architecture and evolution of the Dropbox backend, which is what I'm here to talk to you guys today about.", 'start': 80.161, 'duration': 6.763}, {'end': 98.786, 'text': 'So the rough structure of the talk is first an introduction about what this talk is, then some background about Dropbox, what Dropbox is,', 'start': 89.717, 'duration': 9.069}], 'summary': 'Kevin moduleski leads dropbox server team, responsible for backend architecture and evolution.', 'duration': 29.689, 'max_score': 69.097, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc69097.jpg'}, {'end': 157.636, 'src': 'embed', 'start': 131.276, 'weight': 1, 'content': [{'end': 138.041, 'text': "As he mentioned, there's a lot of talks and a lot of information out there about what do big systems look like?", 'start': 131.276, 'duration': 6.765}, {'end': 142.484, 'text': 'How do the Googles and Facebooks of the world?', 'start': 139.081, 'duration': 3.403}, {'end': 144.565, 'text': 'what do they have at this point?', 'start': 142.484, 'duration': 2.081}, {'end': 153.652, 'text': "But it doesn't help you a whole lot when you're starting off by yourself with maybe one other person and you have nothing and you have to get from there to having a lot.", 'start': 144.805, 'duration': 8.847}, {'end': 157.636, 'text': 'that if you wanted to build Dropbox now just with two people,', 'start': 154.372, 'duration': 3.264}], 'summary': 'Challenges of building big systems from scratch with limited resources.', 'duration': 26.36, 'max_score': 131.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc131276.jpg'}, {'end': 208.162, 'src': 'embed', 'start': 185.474, 'weight': 2, 'content': [{'end': 199.986, 'text': "So this is a talk about what it's like to work on a fast changing back end and a very quickly growing environment where your resources are growing at the same time as the demands and you can't necessarily sort of start with the solution,", 'start': 185.474, 'duration': 14.512}, {'end': 200.747, 'text': 'the final solution.', 'start': 199.986, 'duration': 0.761}, {'end': 204.221, 'text': 'And I think this should be interesting.', 'start': 202.64, 'duration': 1.581}, {'end': 208.162, 'text': 'This was the talk that I wish I had gotten while I was still in school.', 'start': 204.941, 'duration': 3.221}], 'summary': 'Adapting to fast-changing backend in a rapidly growing environment with evolving resources and demands.', 'duration': 22.688, 'max_score': 185.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc185474.jpg'}], 'start': 5.017, 'title': 'Scaling and building scalable systems', 'summary': 'Discusses challenges in scaling systems at dropbox and building scalable systems from scratch, addressing issues in backend engineering and providing practical insights for aspiring startup founders and engineers.', 'chapters': [{'end': 129.495, 'start': 5.017, 'title': 'Scaling systems at dropbox', 'summary': "Discusses the challenges of rapidly growing systems, as exemplified by dropbox's server team lead, kevin moduleski, who shares insights into scaling the backend infrastructure to meet the demands of extremely rapid demand growth.", 'duration': 124.478, 'highlights': ['Kevin Moduleski, server team lead at Dropbox, shares insights into scaling the backend infrastructure to meet the demands of extremely rapid demand growth.', 'Rapid growth is the goal of most startups, but it can be extremely hard, both financially and technically.', 'The talk covers an introduction to Dropbox, technical challenges faced, examples of scaled components, and considerations for alternative approaches.', "Stanford University's EE380 Winter 2011-2012 features a talk by Kevin Moduleski on scaling systems at Dropbox."]}, {'end': 168.67, 'start': 131.276, 'title': 'Building scalable systems from scratch', 'summary': 'Discusses the challenges of building scalable systems from scratch, emphasizing the disparity between starting from scratch and the infrastructure of tech giants like google and facebook, highlighting the need for alternative strategies.', 'duration': 37.394, 'highlights': ['Starting from scratch to build scalable systems presents significant challenges and disparities in comparison to established tech giants like Google and Facebook.', "The discussion emphasizes the limitations of utilizing Google's infrastructure for building a new system, highlighting the exclusivity of this option to Google as the only company with that capability.", 'The chapter emphasizes the need for alternative strategies when starting from scratch to build scalable systems, addressing the question of how to achieve the infrastructure of tech giants like Google and Facebook with limited resources.']}, {'end': 225.829, 'start': 170.44, 'title': 'Working at a startup: back-end engineering', 'summary': 'Discusses the challenges of working on a fast-changing back end in a rapidly growing environment, emphasizing the need to adapt to increasing demands and limited resources, offering practical insights for aspiring startup founders and engineers.', 'duration': 55.389, 'highlights': ['The challenges of working on a fast-changing back end in a rapidly growing environment are discussed, emphasizing the need to adapt to increasing demands and limited resources.', 'The talk provides practical insights for aspiring startup founders and engineers, addressing the discrepancy between theoretical knowledge and real-world constraints.', 'The speaker highlights the issue of not having ample resources or the luxury of investing vast amounts of time in projects, which contrasts with academic learning experiences.']}], 'duration': 220.812, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc5017.jpg', 'highlights': ['Kevin Moduleski, server team lead at Dropbox, shares insights into scaling the backend infrastructure to meet the demands of extremely rapid demand growth.', 'Starting from scratch to build scalable systems presents significant challenges and disparities in comparison to established tech giants like Google and Facebook.', 'The challenges of working on a fast-changing back end in a rapidly growing environment are discussed, emphasizing the need to adapt to increasing demands and limited resources.']}, {'end': 1194.057, 'segs': [{'end': 326.868, 'src': 'embed', 'start': 293.276, 'weight': 0, 'content': [{'end': 305.596, 'text': "As for scale, there's a tens of millions of people who are using this and who are syncing hundreds of millions of files a day.", 'start': 293.276, 'duration': 12.32}, {'end': 326.868, 'text': "So I'm going into this because there's actually some very interesting implications in terms of what we have to do on the back end to support something like this that there are very different back end choices that we have to make compared to companies such as Facebook not to pick on them or anything but just that we offer very different services that have very different requirements.", 'start': 305.616, 'duration': 21.252}], 'summary': 'Tens of millions of users syncing hundreds of millions of files daily, posing unique back-end challenges compared to companies like facebook.', 'duration': 33.592, 'max_score': 293.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc293276.jpg'}, {'end': 399.385, 'src': 'embed', 'start': 375.301, 'weight': 1, 'content': [{'end': 383.323, 'text': 'And so I believe Twitter has, I forget if it was 100 to 1 or 1, 000 to 1, something like that, of tweets read versus tweeted.', 'start': 375.301, 'duration': 8.022}, {'end': 393.086, 'text': "What's interesting about the way we've built Dropbox, though, is everyone's computer has a complete copy of their entire Dropbox.", 'start': 384.763, 'duration': 8.323}, {'end': 399.385, 'text': 'And this means that we basically have a multi-petabyte cache sitting in front of our service.', 'start': 394.161, 'duration': 5.224}], 'summary': 'Twitter has a 100 to 1 or 1,000 to 1 ratio of tweets read versus tweeted. dropbox has a multi-petabyte cache in front of its service.', 'duration': 24.084, 'max_score': 375.301, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc375301.jpg'}, {'end': 768.741, 'src': 'embed', 'start': 741.261, 'weight': 2, 'content': [{'end': 744.183, 'text': 'This one server was doing everything.', 'start': 741.261, 'duration': 2.922}, {'end': 745.904, 'text': 'It was running our application servers.', 'start': 744.203, 'duration': 1.701}, {'end': 752.228, 'text': "It was running the, I don't even know what web server they were running in front of it that was serving static content.", 'start': 746.424, 'duration': 5.804}, {'end': 758.012, 'text': 'It was running MySQL and restoring all data that anyone was putting on Dropbox on its local disks.', 'start': 752.768, 'duration': 5.244}, {'end': 763.116, 'text': "It's surprising, but yes, that's how it started.", 'start': 760.494, 'duration': 2.622}, {'end': 766.699, 'text': "And it's not that they didn't know how to build better things.", 'start': 763.156, 'duration': 3.543}, {'end': 768.741, 'text': "I mean, they're both MIT educated.", 'start': 766.92, 'duration': 1.821}], 'summary': 'One server was overloaded, handling application servers, web servers, mysql, and dropbox data, despite the knowledge of better practices by mit-educated individuals.', 'duration': 27.48, 'max_score': 741.261, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc741261.jpg'}, {'end': 809.497, 'src': 'embed', 'start': 783.454, 'weight': 3, 'content': [{'end': 790.257, 'text': "It's that they need to prove to themselves and everyone else that it was the right thing for them to quit their jobs and drop out of school and all of that stuff.", 'start': 783.454, 'duration': 6.803}, {'end': 795.619, 'text': 'So this is where, this is the humble beginnings of Dropbox.', 'start': 791.637, 'duration': 3.982}, {'end': 799.481, 'text': "So say you've gotten to this point.", 'start': 797.38, 'duration': 2.101}, {'end': 801.954, 'text': 'What would you do from here?', 'start': 800.814, 'duration': 1.14}, {'end': 804.815, 'text': 'What do you think your two guys working?', 'start': 801.994, 'duration': 2.821}, {'end': 809.497, 'text': "I don't know why he likes to say this, but they like to say how they were working their boxers.", 'start': 805.175, 'duration': 4.322}], 'summary': 'Dropbox founders quit jobs, start humble beginnings, work in boxers.', 'duration': 26.043, 'max_score': 783.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc783454.jpg'}, {'end': 1188.209, 'src': 'heatmap', 'start': 1107.92, 'weight': 1, 'content': [{'end': 1117.733, 'text': 'So one of the next things that was done as well was add this new service called the notification servers or not servers that will start pinging.', 'start': 1107.92, 'duration': 9.813}, {'end': 1120.918, 'text': 'the clients will actually start pushing down notifications to them.', 'start': 1117.733, 'duration': 3.185}, {'end': 1133.118, 'text': 'And the server was split into two web servers, one running in managed hosting and one running in AWS,', 'start': 1123.01, 'duration': 10.108}, {'end': 1138.122, 'text': 'where the one in AWS is hosting all the file contents and accepting all the uploads.', 'start': 1133.118, 'duration': 5.004}, {'end': 1143.746, 'text': 'And the one in managed hosting is doing all the metadata calls.', 'start': 1139.243, 'duration': 4.503}, {'end': 1150.612, 'text': 'So they were called MetaServer and BlockServer because our file data API is based around file blocks.', 'start': 1144.407, 'duration': 6.205}, {'end': 1154.18, 'text': 'So this was early 2008.', 'start': 1153.14, 'duration': 1.04}, {'end': 1159.302, 'text': 'I think there were, I guess, roughly 50, 000 users at this time.', 'start': 1154.18, 'duration': 5.122}, {'end': 1161.903, 'text': 'Dropbox was in private beta.', 'start': 1159.923, 'duration': 1.98}, {'end': 1175.368, 'text': "And I guess I won't try to ask what's going to happen, because it's too hard to know what the exact things that are that's going to happen.", 'start': 1164.364, 'duration': 11.004}, {'end': 1188.209, 'text': "I guess that's part of the point of this talk that it's very easy to screw yourself by overbuilding because you don't even know what the things are that are going to fail.", 'start': 1175.909, 'duration': 12.3}], 'summary': 'In early 2008, dropbox had around 50,000 users in private beta, with servers split between managed hosting and aws for different functions.', 'duration': 80.289, 'max_score': 1107.92, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1107920.jpg'}, {'end': 1175.368, 'src': 'embed', 'start': 1144.407, 'weight': 4, 'content': [{'end': 1150.612, 'text': 'So they were called MetaServer and BlockServer because our file data API is based around file blocks.', 'start': 1144.407, 'duration': 6.205}, {'end': 1154.18, 'text': 'So this was early 2008.', 'start': 1153.14, 'duration': 1.04}, {'end': 1159.302, 'text': 'I think there were, I guess, roughly 50, 000 users at this time.', 'start': 1154.18, 'duration': 5.122}, {'end': 1161.903, 'text': 'Dropbox was in private beta.', 'start': 1159.923, 'duration': 1.98}, {'end': 1175.368, 'text': "And I guess I won't try to ask what's going to happen, because it's too hard to know what the exact things that are that's going to happen.", 'start': 1164.364, 'duration': 11.004}], 'summary': 'In early 2008, dropbox had around 50,000 users during its private beta testing phase.', 'duration': 30.961, 'max_score': 1144.407, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1144407.jpg'}], 'start': 227.076, 'title': "Dropbox's technical and backend challenges", 'summary': 'Delves into the technical aspects of dropbox startup, focusing on the scale of tens of millions of users syncing hundreds of millions of files daily. it also discusses the unique backend challenges faced, including the high write volume and the initial architecture. additionally, it highlights the humble beginnings of dropbox, emphasizing the strategic decisions and the achievement of a private beta with approximately 50,000 users by early 2008.', 'chapters': [{'end': 305.596, 'start': 227.076, 'title': 'Technical aspects of dropbox startup', 'summary': 'Discusses the technical backend aspect of dropbox startup, highlighting its goal to allow easy access to files and data, and the scale of tens of millions of users syncing hundreds of millions of files daily.', 'duration': 78.52, 'highlights': ["Dropbox's goal is to make it simple for users to access their files and data, achieved through their main sync product, used by tens of millions of people and syncing hundreds of millions of files daily.", 'The chapter emphasizes the importance of understanding the technical backend aspect of startups, providing insight into how Dropbox functions and its widespread usage among users.', 'The speaker mentions the availability of classes at Stanford on startup methodologies and technical processes, demonstrating the significance of understanding the technical aspects of startups.']}, {'end': 763.116, 'start': 305.616, 'title': 'Backend challenges at dropbox', 'summary': "Discusses the unique backend challenges faced by dropbox, including the high write volume, the need for high consistency and correctness, and the initial architecture of their backend, exemplifying the server team's elegant architecture and the single server setup.", 'duration': 457.5, 'highlights': ["Dropbox has a unique write-to-read ratio, with a roughly one-to-one ratio due to each user's computer having a complete copy of their entire Dropbox, resulting in 10 to 100 times more writes than other companies. Dropbox's write-to-read ratio is roughly one-to-one, as each user's computer stores a complete copy of their Dropbox, leading to 10 to 100 times more writes than other companies.", 'The high consistency and correctness requirements at Dropbox, represented by the ACID properties, make it a hard problem in distributed systems, as they cannot trade off atomicity, consistency, isolation, or durability. Dropbox has high consistency and correctness requirements, represented by the ACID properties, making it a challenging problem in distributed systems as they cannot compromise on atomicity, consistency, isolation, or durability.', "The initial elegant architecture of Dropbox's backend involved a single server that handled application servers, web servers, and MySQL, exhibiting an exceptionally simple and efficient setup. Dropbox's initial backend architecture was remarkably elegant, with a single server managing application servers, web servers, and MySQL, showcasing a simple and efficient setup."]}, {'end': 1194.057, 'start': 763.156, 'title': 'The humble beginnings of dropbox', 'summary': 'Discusses the early stages of dropbox, highlighting the conscious choice to prioritize proving themselves, the initial challenges with zero users and the strategic decisions made to address server capacity and functionality issues, leading to a private beta with approximately 50,000 users by early 2008.', 'duration': 430.901, 'highlights': ['The strategic decision to prioritize proving themselves and the initial challenges with zero users The founders prioritized proving themselves and faced initial challenges with approximately zero users in 2007, ultimately aiming to validate their decision to quit their jobs and drop out of school.', 'Strategic decisions made to address server capacity and functionality issues The team made strategic decisions to address server capacity and functionality issues, including moving data to Amazon S3, separating the MySQL instance to a different box, and splitting the work into multiple servers, resulting in a private beta with approximately 50,000 users by early 2008.', 'Introduction of a new service called the notification servers and the split of the server into two web servers The introduction of a new service called the notification servers and the split of the server into two web servers, one running in managed hosting and one running in AWS, played a pivotal role in the evolution of Dropbox in early 2008, during its private beta with approximately 50,000 users.']}], 'duration': 966.981, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc227076.jpg', 'highlights': ["Dropbox's main sync product is used by tens of millions of people and syncs hundreds of millions of files daily.", 'Dropbox has a unique write-to-read ratio, resulting in 10 to 100 times more writes than other companies.', "The initial elegant architecture of Dropbox's backend involved a single server that handled application servers, web servers, and MySQL.", 'The founders faced initial challenges with approximately zero users in 2007, aiming to validate their decision to quit their jobs and drop out of school.', 'Strategic decisions made to address server capacity and functionality issues resulted in a private beta with approximately 50,000 users by early 2008.']}, {'end': 2055.428, 'segs': [{'end': 1351.501, 'src': 'embed', 'start': 1318.778, 'weight': 0, 'content': [{'end': 1319.638, 'text': 'You can shard it.', 'start': 1318.778, 'duration': 0.86}, {'end': 1320.779, 'text': 'You can partition it.', 'start': 1319.798, 'duration': 0.981}, {'end': 1327, 'text': 'But it turns out that it was just so much easier to add memcache and just cache everything.', 'start': 1320.799, 'duration': 6.201}, {'end': 1330.28, 'text': 'Or not everything, but start caching the easy things to cache.', 'start': 1327.08, 'duration': 3.2}, {'end': 1339.362, 'text': 'And that just sort of let us avoid having to deal with these really complicated database scaling issues.', 'start': 1331.12, 'duration': 8.242}, {'end': 1351.501, 'text': 'So doing all those three things, we ended up with roughly this architecture at launch, where we added a bunch of meta servers and block servers,', 'start': 1340.99, 'duration': 10.511}], 'summary': 'Implemented memcache to simplify scaling, resulting in architecture with meta servers and block servers at launch.', 'duration': 32.723, 'max_score': 1318.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1318778.jpg'}, {'end': 1592.062, 'src': 'embed', 'start': 1560.671, 'weight': 1, 'content': [{'end': 1568.083, 'text': 'So we had the option of either building our own load balancing software and we could add that feature if we wanted to,', 'start': 1560.671, 'duration': 7.412}, {'end': 1578.899, 'text': 'or we could We could play some sort of complicated game or we could allow just one load balancer to die and just lose a whole bunch of capacity,', 'start': 1568.083, 'duration': 10.816}, {'end': 1581.603, 'text': 'which we did for a little while.', 'start': 1578.899, 'duration': 2.704}, {'end': 1592.062, 'text': 'But the result we ended up with was we Every load balancer now has a pair, has a hot backup.', 'start': 1582.384, 'duration': 9.678}], 'summary': 'Implemented load balancer redundancy, achieving 1:1 pairing for resilience.', 'duration': 31.391, 'max_score': 1560.671, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1560671.jpg'}, {'end': 1794.811, 'src': 'embed', 'start': 1761.991, 'weight': 2, 'content': [{'end': 1766.039, 'text': 'So the result is we have tens of millions of connections open to these knot servers.', 'start': 1761.991, 'duration': 4.048}, {'end': 1771.906, 'text': "And we're sending I forget the exact figure.", 'start': 1766.58, 'duration': 5.326}, {'end': 1774.607, 'text': "We're sending a lot of notifications out at the same time.", 'start': 1771.926, 'duration': 2.681}, {'end': 1785.429, 'text': 'So we actually had to add a two-level hierarchy for distributing all of this to all the knot servers to then distribute to the clients,', 'start': 1775.947, 'duration': 9.482}, {'end': 1794.811, 'text': 'because it was just too expensive to notify 100 knot server processes that they had to notify their clients.', 'start': 1785.429, 'duration': 9.382}], 'summary': 'Tens of millions of connections open to knot servers, sending a large number of notifications, requiring a two-level hierarchy for distribution.', 'duration': 32.82, 'max_score': 1761.991, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1761991.jpg'}, {'end': 1954.356, 'src': 'embed', 'start': 1916.921, 'weight': 3, 'content': [{'end': 1920.042, 'text': 'What does that mean??', 'start': 1916.921, 'duration': 3.121}, {'end': 1922.202, 'text': 'Does it block a file or is it smaller than that??', 'start': 1920.062, 'duration': 2.14}, {'end': 1934.485, 'text': 'What we do is we take a file and we divide it up into four megabyte chunks, and each chunk is a block in terms of deduplication.', 'start': 1926.021, 'duration': 8.464}, {'end': 1941.609, 'text': 'So we take the hash of the chunk, and if two hashes are the same, then they get mapped to the same object in S3.', 'start': 1935.806, 'duration': 5.803}, {'end': 1950.193, 'text': 'How big is your hash? No.', 'start': 1941.669, 'duration': 8.524}, {'end': 1953.555, 'text': 'SHA-256 Yeah, OK.', 'start': 1950.213, 'duration': 3.342}, {'end': 1954.356, 'text': "I think it's SHA-256 then.", 'start': 1953.715, 'duration': 0.641}], 'summary': 'Files are divided into 4mb blocks for deduplication using sha-256 hash in s3.', 'duration': 37.435, 'max_score': 1916.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1916921.jpg'}], 'start': 1196.117, 'title': 'Architectural changes and scaling challenges', 'summary': 'Discusses architectural changes addressing latency, database scaling, adding servers, rpcs, and memcache, resulting in a stable architecture. it also covers challenges in evolving high-throughput server architecture, including distributing notifications to millions of clients, sharding shared folders, and file deduplication.', 'chapters': [{'end': 1706.29, 'start': 1196.117, 'title': 'Architectural changes for database scaling', 'summary': 'Discusses the architectural changes made to address latency issues and database scaling by adding more servers, implementing rpcs, and using memcache to improve consistency, resulting in a more stable architecture with high availability load balancing.', 'duration': 510.173, 'highlights': ['Architectural changes to address latency issues and database scaling. The chapter discusses the need for architectural changes to address latency issues and database scaling, including adding more servers and implementing RPCs to improve performance.', 'Implementation of RPCs to encapsulate logic and handle database calls. The decision to have block servers do RPCs to encapsulate all the logic of database calls, addressing latency issues and improving performance.', 'Introduction of memcache to cache data and improve consistency. The addition of memcache to cache data and improve consistency, making it easier to avoid complex database scaling issues.', 'Implementation of high availability load balancing with hot backup. The implementation of a high availability load balancing cluster with hot backup to ensure high availability and improve performance.']}, {'end': 2055.428, 'start': 1708.21, 'title': 'Challenges in evolving high-throughput server architecture', 'summary': 'Discusses the challenges faced in evolving a high-throughput server architecture, including the need to distribute notifications to tens of millions of clients, the complexities of sharding shared folders, and the use of four megabyte chunks for file deduplication.', 'duration': 347.218, 'highlights': ['The need to distribute notifications to tens of millions of clients required the addition of a two-level hierarchy for distributing the notifications to all the knot servers, due to the high throughput of the system and the expense of notifying 100 knot server processes individually. Tens of millions of connections open to the knot servers, with a large number of notifications being sent simultaneously, led to the implementation of a two-level hierarchy for distributing notifications to all the knot servers.', 'Challenges in sharding shared folders arise due to the need to query the relationship table between users and shared folders in both directions, with a requirement for precise and accurate results for technical reasons. The complexities of sharding shared folders stem from the necessity to query the relationship table between users and shared folders in both directions and the technical requirement for precise results.', 'The use of four megabyte chunks and SHA-256 hashes for file deduplication, with the potential for double-digit percentage savings through deduplication, as well as the use of rsync diff for small modifications, demonstrates the strategies employed for efficient deduplication. The implementation of four megabyte chunks and SHA-256 hashes for file deduplication, along with the utilization of rsync diff for small modifications, showcases the efficient strategies for deduplication and potential double-digit percentage savings.']}], 'duration': 859.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc1196117.jpg', 'highlights': ['Introduction of memcache to cache data and improve consistency, making it easier to avoid complex database scaling issues.', 'The implementation of a high availability load balancing cluster with hot backup to ensure high availability and improve performance.', 'The need to distribute notifications to tens of millions of clients required the addition of a two-level hierarchy for distributing the notifications to all the knot servers.', 'The implementation of four megabyte chunks and SHA-256 hashes for file deduplication, along with the utilization of rsync diff for small modifications, showcases the efficient strategies for deduplication and potential double-digit percentage savings.']}, {'end': 2638.629, 'segs': [{'end': 2109.482, 'src': 'embed', 'start': 2059.391, 'weight': 0, 'content': [{'end': 2065.467, 'text': 'How often does the client pull the servers? Well, it used to be, I think, once a minute.', 'start': 2059.391, 'duration': 6.076}, {'end': 2073.731, 'text': 'But now that we have the not servers, and also now that we have tens of millions of clients, polling would just crush us.', 'start': 2065.507, 'duration': 8.224}, {'end': 2080.873, 'text': "Which is actually funny, because sometimes we didn't used to have good back offs.", 'start': 2076.411, 'duration': 4.462}, {'end': 2084.975, 'text': 'So when the site would go down, the clients would DDoS us.', 'start': 2081.413, 'duration': 3.562}, {'end': 2087.475, 'text': "But that's improved that a lot.", 'start': 2085.554, 'duration': 1.921}, {'end': 2091.677, 'text': "But now they don't poll at all, because they just connect to the not servers.", 'start': 2087.495, 'duration': 4.182}, {'end': 2095.514, 'text': 'So they just keep a connection open all the time? Yeah.', 'start': 2093.473, 'duration': 2.041}, {'end': 2097.695, 'text': 'Yeah, they long-pull the NOT servers.', 'start': 2096.275, 'duration': 1.42}, {'end': 2102.198, 'text': 'And then when we have a notification for them, it sends that down.', 'start': 2098.296, 'duration': 3.902}, {'end': 2109.482, 'text': 'What kind of connection count are you getting on a NOT server these days? I believe a single NOT server machine.', 'start': 2102.878, 'duration': 6.604}], 'summary': 'Client used to pull servers every minute, now long-pulls not servers due to millions of clients; improved back offs to prevent ddos attacks.', 'duration': 50.091, 'max_score': 2059.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2059391.jpg'}, {'end': 2160.795, 'src': 'embed', 'start': 2138.363, 'weight': 3, 'content': [{'end': 2145.947, 'text': 'These things are not fun to push super hard, because even though a single machine can have a million connections open,', 'start': 2138.363, 'duration': 7.584}, {'end': 2149.329, 'text': "it can't open a million connections in any reasonable amount of time.", 'start': 2145.947, 'duration': 3.382}, {'end': 2153.251, 'text': "So once they go down, they're very hard to bring back up.", 'start': 2150.089, 'duration': 3.162}, {'end': 2156.633, 'text': "And we don't want to push them too close to the limit.", 'start': 2154.151, 'duration': 2.482}, {'end': 2160.795, 'text': 'Is your deduping on a per user basis?', 'start': 2157.993, 'duration': 2.802}], 'summary': 'Pushing a single machine too hard can cause difficulty in bringing it back up, limiting connections and deduping per user basis.', 'duration': 22.432, 'max_score': 2138.363, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2138363.jpg'}, {'end': 2336.781, 'src': 'embed', 'start': 2305.947, 'weight': 4, 'content': [{'end': 2306.967, 'text': 'Everything is in Virginia.', 'start': 2305.947, 'duration': 1.02}, {'end': 2313.091, 'text': "I don't know the exact percentage that's international, but I think the majority is international at this point.", 'start': 2308.228, 'duration': 4.863}, {'end': 2313.232, 'text': 'Like 65.', 'start': 2313.111, 'duration': 0.121}, {'end': 2313.672, 'text': '65% international usage.', 'start': 2313.232, 'duration': 0.44}, {'end': 2321.709, 'text': 'So yes, we do serve all the data out of Virginia.', 'start': 2318.446, 'duration': 3.263}, {'end': 2325.532, 'text': 'We do serve all the metadata out of San Jose now.', 'start': 2321.749, 'duration': 3.783}, {'end': 2336.781, 'text': 'And I guess this is another point that we obviously know that if you want better performance, you go international and you figure that out.', 'start': 2327.834, 'duration': 8.947}], 'summary': 'About 65% of usage is international, with data served from virginia and metadata from san jose.', 'duration': 30.834, 'max_score': 2305.947, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2305947.jpg'}, {'end': 2554.385, 'src': 'embed', 'start': 2529.638, 'weight': 5, 'content': [{'end': 2535.321, 'text': 'We watch how many requests are happening from all the different channels per second.', 'start': 2529.638, 'duration': 5.683}, {'end': 2540.662, 'text': 'We watched the breakdown for important requests.', 'start': 2537.281, 'duration': 3.381}, {'end': 2544.043, 'text': "What's the breakdown in time that went into that request?", 'start': 2541.502, 'duration': 2.541}, {'end': 2554.385, 'text': "So if it takes 100 milliseconds to commit new files, that's 40 milliseconds of CPU time on the web server.", 'start': 2544.683, 'duration': 9.702}], 'summary': 'Monitoring requests from various channels and analyzing time breakdown for important requests, with 40 milliseconds of cpu time on the web server for committing new files.', 'duration': 24.747, 'max_score': 2529.638, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2529638.jpg'}, {'end': 2654.463, 'src': 'embed', 'start': 2618.024, 'weight': 6, 'content': [{'end': 2620.545, 'text': 'a while ago, like what were you doing then?', 'start': 2618.024, 'duration': 2.521}, {'end': 2621.145, 'text': "that didn't work.", 'start': 2620.545, 'duration': 0.6}, {'end': 2624.986, 'text': 'and what have you done now to keep that sort of thing from?', 'start': 2621.145, 'duration': 3.841}, {'end': 2628.046, 'text': 'like where are things encrypted and decrypted??', 'start': 2624.986, 'duration': 3.06}, {'end': 2629.226, 'text': "What's the risk?", 'start': 2628.086, 'duration': 1.14}, {'end': 2638.629, 'text': "I can't talk too much about any specific thing that has happened, though I can say that, just in general,", 'start': 2629.967, 'duration': 8.662}, {'end': 2643.77, 'text': 'we take security and privacy very seriously and respond very aggressively whenever something does happen.', 'start': 2638.629, 'duration': 5.141}, {'end': 2654.463, 'text': "In general, yeah, I guess there's not a whole lot I can go into right now.", 'start': 2647.819, 'duration': 6.644}], 'summary': 'Emphasizing strong security and privacy measures, responding aggressively to incidents.', 'duration': 36.439, 'max_score': 2618.024, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2618024.jpg'}], 'start': 2059.391, 'title': 'Client-server connection and storage infrastructure', 'summary': 'Delves into client-server connection challenges and scaling, such as transitioning to long-polling with not servers capable of handling a million connections per machine and improvements in client behavior. it also covers storage infrastructure and operations, including decision-making between amazon and self-hosting, international usage at 65%, server load and request monitoring, and encryption measures.', 'chapters': [{'end': 2160.795, 'start': 2059.391, 'title': 'Client server connection and scaling', 'summary': 'Discusses the challenges of client-server connections, including the transition from polling to long-polling with the not servers, which can handle a million connections per machine and the improvements in client behavior resulting in better server performance.', 'duration': 101.404, 'highlights': ['The NOT servers can handle a million connections per machine, resulting in improved server performance and client behavior.', 'The transition from polling to long-polling with the NOT servers has significantly reduced the load on the servers, preventing crushes due to millions of clients.', 'The improvement in client behavior has mitigated DDoS attacks when the site goes down, resulting in better overall performance.', 'The challenges of pushing the NOT servers to their limit due to the difficulty in bringing them back up and the need to avoid pushing them too close to the limit.']}, {'end': 2638.629, 'start': 2161.035, 'title': 'Storage infrastructure and operations', 'summary': 'Discusses the storage infrastructure and operations, including decision-making between amazon and self-hosting, international usage, monitoring and metrics, and security and encryption, with highlights on international usage at 65%, server load and request monitoring, and encryption and decryption measures.', 'duration': 477.594, 'highlights': ['The majority of the customer base is international at 65%.', 'Monitoring includes tracking server load, requests from different channels, breakdown of time for requests, and bandwidth measured by users.', 'Security measures and encryption strategies are in place, but specific details cannot be disclosed.']}], 'duration': 579.238, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2059391.jpg', 'highlights': ['The NOT servers can handle a million connections per machine, resulting in improved server performance and client behavior.', 'The transition from polling to long-polling with the NOT servers has significantly reduced the load on the servers, preventing crushes due to millions of clients.', 'The improvement in client behavior has mitigated DDoS attacks when the site goes down, resulting in better overall performance.', 'The challenges of pushing the NOT servers to their limit due to the difficulty in bringing them back up and the need to avoid pushing them too close to the limit.', 'The majority of the customer base is international at 65%.', 'Monitoring includes tracking server load, requests from different channels, breakdown of time for requests, and bandwidth measured by users.', 'Security measures and encryption strategies are in place, but specific details cannot be disclosed.']}, {'end': 2972.272, 'segs': [{'end': 2743.864, 'src': 'embed', 'start': 2680.533, 'weight': 0, 'content': [{'end': 2687.197, 'text': 'And in particular, diving into how we store all the metadata about your Dropbox.', 'start': 2680.533, 'duration': 6.664}, {'end': 2698.91, 'text': 'The way we store the metadata for what you have in your Dropbox is as a log of all the edits that has happened to it.', 'start': 2690.424, 'duration': 8.486}, {'end': 2707.297, 'text': 'So whenever your client notices changes, it uploads those changes to the meta servers, which record them in this log.', 'start': 2699.971, 'duration': 7.326}, {'end': 2710.359, 'text': 'And this is called the server file journal.', 'start': 2708.497, 'duration': 1.862}, {'end': 2716.163, 'text': "I believe there's also a client-side version of this as well, which is why it's called the server file journal.", 'start': 2710.899, 'duration': 5.264}, {'end': 2725.65, 'text': 'This is a abridged schema of server file journal.', 'start': 2721.465, 'duration': 4.185}, {'end': 2732.678, 'text': 'This is the original one that we started with, or the earliest one that I could find at least.', 'start': 2726.411, 'duration': 6.267}, {'end': 2737.864, 'text': "And it's only including the sort of interesting fields in it.", 'start': 2734.22, 'duration': 3.644}, {'end': 2743.864, 'text': 'It has an ID field, which is just the index in the log.', 'start': 2741.203, 'duration': 2.661}], 'summary': 'Dropbox stores metadata as a log of edits, uploaded to meta servers for recording.', 'duration': 63.331, 'max_score': 2680.533, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2680533.jpg'}, {'end': 2805.127, 'src': 'embed', 'start': 2777.211, 'weight': 2, 'content': [{'end': 2779.773, 'text': "We're using MySQL and in particular InnoDB here.", 'start': 2777.211, 'duration': 2.562}, {'end': 2786.479, 'text': 'So what this means is that on disk, things are ordered by ID.', 'start': 2780.534, 'duration': 5.945}, {'end': 2789.281, 'text': "That's what the primary key means.", 'start': 2786.979, 'duration': 2.302}, {'end': 2792.263, 'text': "So it's very fast to scan things in ID order.", 'start': 2789.381, 'duration': 2.882}, {'end': 2796.985, 'text': 'Any other order is not as fast, even if you have an index on it.', 'start': 2793.404, 'duration': 3.581}, {'end': 2802.066, 'text': 'And writing things, appending things in ID order is extremely fast.', 'start': 2798.145, 'duration': 3.921}, {'end': 2805.127, 'text': 'Appending it in any other order is not as fast.', 'start': 2802.866, 'duration': 2.261}], 'summary': 'Using mysql with innodb ensures fast scanning and appending in id order.', 'duration': 27.916, 'max_score': 2777.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2777211.jpg'}, {'end': 2920.654, 'src': 'embed', 'start': 2888.325, 'weight': 4, 'content': [{'end': 2892.207, 'text': 'And then you can list those as a list of revisions.', 'start': 2888.325, 'duration': 3.882}, {'end': 2902.07, 'text': 'So, to make this faster, because this was a new feature that we wanted to make more efficient, we added a new field called prevrev,', 'start': 2893.247, 'duration': 8.823}, {'end': 2908.532, 'text': 'which I believe points to the ID of the previous entry of that file.', 'start': 2902.07, 'duration': 6.462}, {'end': 2912.374, 'text': 'So this was added because we added a new feature.', 'start': 2910.293, 'duration': 2.081}, {'end': 2920.654, 'text': 'The next thing was the performance of the system started to get pretty bad.', 'start': 2914.715, 'duration': 5.939}], 'summary': "Implemented new field 'prevrev' to improve system performance.", 'duration': 32.329, 'max_score': 2888.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2888325.jpg'}, {'end': 2972.272, 'src': 'embed', 'start': 2945.566, 'weight': 5, 'content': [{'end': 2951.987, 'text': 'So what this means is first, things are sorted by NSID, so everything within an NSID is grouped together.', 'start': 2945.566, 'duration': 6.421}, {'end': 2955.828, 'text': 'Then latest.', 'start': 2953.188, 'duration': 2.64}, {'end': 2964.01, 'text': "so that means that there's two sections of the log one that is sort of previous entries and one that is all your sort of current active state of your Dropbox.", 'start': 2955.828, 'duration': 8.182}, {'end': 2972.272, 'text': 'And then ID to sort of sort it into essentially timestamp order and have a log of that.', 'start': 2965.03, 'duration': 7.242}], 'summary': 'Transcript explains sorting by nsid, showing previous and current entries, and sorting by id for timestamp order.', 'duration': 26.706, 'max_score': 2945.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2945566.jpg'}], 'start': 2638.629, 'title': 'Dropbox, mysql, and system performance', 'summary': "Covers dropbox's metadata storage, mysql innodb performance optimization, and system performance improvements. it includes a server file journal for metadata storage in dropbox, the efficiency of innodb in mysql, and system performance enhancements to address increasing log size.", 'chapters': [{'end': 2775.07, 'start': 2638.629, 'title': 'Dropbox metadata storage', 'summary': 'Discusses the security measures taken by dropbox, and delves into the storage of metadata using a server file journal, which records edits and changes made to files, with an id field as the primary key.', 'duration': 136.441, 'highlights': ['The server file journal records all the edits made to the metadata of files in Dropbox, with changes being uploaded to the meta servers and logged in the journal.', 'The server file journal schema includes fields such as ID, file name, case path, latest, and NSID, with the ID field serving as the primary key.']}, {'end': 2857.699, 'start': 2777.211, 'title': 'Mysql innodb performance optimization', 'summary': 'Discusses the performance benefits of using innodb in mysql, emphasizing the efficiency of scanning and appending data in id order, and highlights the changes made over time to address case sensitivity issues.', 'duration': 80.488, 'highlights': ['Appending data in ID order in MySQL InnoDB is extremely fast, while appending it in any other order is not as fast.', 'On disk, things in MySQL InnoDB are ordered by ID, making it very fast to scan things in ID order.', 'Changes were made over time to address case sensitivity issues, such as getting rid of case path and shifting the responsibility for handling case sensitivity to the clients.']}, {'end': 2972.272, 'start': 2859.36, 'title': 'Improving system performance and file revision feature', 'summary': "Discusses the addition of a new field called prevrev to make the feature of viewing file revisions more efficient, and the reorganization of the system's log to improve performance by sorting entries by nsid and timestamp, in response to the system's deteriorating performance due to the increasing log size.", 'duration': 112.912, 'highlights': ["A new field called prevrev was added to improve the efficiency of viewing file revisions by pointing to the ID of the previous entry of that file, in response to the feature's performance issues.", "The system's performance deteriorated due to the increasing log size, prompting the reorganization of the log by sorting entries by NSID and timestamp to improve efficiency.", "The feature of viewing all past revisions of a file was initially inefficient and expensive to perform, as it required searching the entire log of everyone's drop boxes, leading to the addition of the prevrev field to address this inefficiency."]}], 'duration': 333.643, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2638629.jpg', 'highlights': ['The server file journal records all the edits made to the metadata of files in Dropbox, with changes being uploaded to the meta servers and logged in the journal.', 'The server file journal schema includes fields such as ID, file name, case path, latest, and NSID, with the ID field serving as the primary key.', 'Appending data in ID order in MySQL InnoDB is extremely fast, while appending it in any other order is not as fast.', 'On disk, things in MySQL InnoDB are ordered by ID, making it very fast to scan things in ID order.', "A new field called prevrev was added to improve the efficiency of viewing file revisions by pointing to the ID of the previous entry of that file, in response to the feature's performance issues.", "The system's performance deteriorated due to the increasing log size, prompting the reorganization of the log by sorting entries by NSID and timestamp to improve efficiency."]}, {'end': 3604.694, 'segs': [{'end': 3042.285, 'src': 'embed', 'start': 3008.143, 'weight': 0, 'content': [{'end': 3015.986, 'text': 'So the next thing that was done was file name was changed from a 260 length string to a 255.', 'start': 3008.143, 'duration': 7.843}, {'end': 3026.352, 'text': "It seems like a kind of random thing to do that if you don't You don't know anything less than a lot about MySQL.", 'start': 3015.986, 'duration': 10.366}, {'end': 3029.395, 'text': "It's not clear at all why this would happen.", 'start': 3027.213, 'duration': 2.182}, {'end': 3042.285, 'text': 'But it turns out that actually MySQL stores bar chars with size at most 255 more efficiently than with a size larger than that.', 'start': 3030.195, 'duration': 12.09}], 'summary': 'File name was optimized by reducing length from 260 to 255 characters for more efficient storage in mysql.', 'duration': 34.142, 'max_score': 3008.143, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3008143.jpg'}, {'end': 3088.552, 'src': 'embed', 'start': 3061.357, 'weight': 1, 'content': [{'end': 3067.322, 'text': "There's just one of those, like reading the manual, taking the time to actually do that, taking time away from building features,", 'start': 3061.357, 'duration': 5.965}, {'end': 3068.463, 'text': 'actually started to make sense.', 'start': 3067.322, 'duration': 1.141}, {'end': 3076.791, 'text': 'I think around the same time, all these fields were declared not null because that also saves another byte per field.', 'start': 3069.184, 'duration': 7.607}, {'end': 3088.552, 'text': 'The next thing that was done was a little bit more subtle, and that was getting rid of latest in the primary key.', 'start': 3079.344, 'duration': 9.208}], 'summary': 'Efficiency improvements: reading the manual saved time, fields declared not null saved a byte per field, and latest was removed from the primary key.', 'duration': 27.195, 'max_score': 3061.357, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3061357.jpg'}, {'end': 3315.432, 'src': 'embed', 'start': 3249.178, 'weight': 3, 'content': [{'end': 3256.08, 'text': "There's some simple things we can do where we can actually run a production workload on a production box.", 'start': 3249.178, 'duration': 6.902}, {'end': 3259.542, 'text': "It's still hard to make that work.", 'start': 3257.041, 'duration': 2.501}, {'end': 3265.184, 'text': "I think these changes all happened a while ago, so I'm not 100% sure what went into them.", 'start': 3260.862, 'duration': 4.322}, {'end': 3279.293, 'text': 'Yeah, these days, really the only way to, if you really want good precision on whether or not change will be helpful, is to test it in prod.', 'start': 3269.747, 'duration': 9.546}, {'end': 3286.457, 'text': "And it's just too hard to sort of generate realistic data in one close.", 'start': 3282.195, 'duration': 4.262}, {'end': 3304.663, 'text': 'Do you do A-B testing? bring up a new build, a canary, and take some part of the load and migrate over into a prod? Yeah, we do.', 'start': 3287.038, 'duration': 17.625}, {'end': 3311.589, 'text': "We're increasing our usage of stage rollouts and A-B testing and all that kind of stuff.", 'start': 3304.683, 'duration': 6.906}, {'end': 3315.432, 'text': "That's the only way to find out.", 'start': 3314.371, 'duration': 1.061}], 'summary': 'Testing in production is essential for good precision in determining helpful changes; a-b testing and stage rollouts are being increasingly used.', 'duration': 66.254, 'max_score': 3249.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3249178.jpg'}, {'end': 3369.831, 'src': 'embed', 'start': 3345.95, 'weight': 2, 'content': [{'end': 3353.58, 'text': "I think the interesting things about this evolution is that we've seen, especially with the primary key changes.", 'start': 3345.95, 'duration': 7.63}, {'end': 3358.407, 'text': "we've seen massive changes in the performance characteristics of this table over time.", 'start': 3353.58, 'duration': 4.827}, {'end': 3364.608, 'text': 'what is a very small amount of text to change the primary key.', 'start': 3360.546, 'duration': 4.062}, {'end': 3369.831, 'text': 'I mean MySQL has to do a lot of work and you have to be careful about telling MySQL to do it,', 'start': 3365.109, 'duration': 4.722}], 'summary': 'Evolution led to significant performance changes with primary key changes in mysql.', 'duration': 23.881, 'max_score': 3345.95, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3345950.jpg'}, {'end': 3542.947, 'src': 'embed', 'start': 3514.069, 'weight': 6, 'content': [{'end': 3518.81, 'text': "We're still having to make all these trade-offs, that we know all these things that we could be building,", 'start': 3514.069, 'duration': 4.741}, {'end': 3520.77, 'text': 'but we know that we have fewer people than we want.', 'start': 3518.81, 'duration': 1.96}, {'end': 3526.611, 'text': "And so here these are just some examples of decisions that we're currently going through.", 'start': 3522.07, 'duration': 4.541}, {'end': 3536.873, 'text': 'that exhibit some of the same properties that we want to have some sort of batch processing infrastructure that can run jobs over our metadata.', 'start': 3526.611, 'duration': 10.262}, {'end': 3542.947, 'text': "And if you just sit down and think about what's the best way to do it, you could say oh,", 'start': 3537.805, 'duration': 5.142}], 'summary': 'Trade-offs due to limited resources in decision-making process.', 'duration': 28.878, 'max_score': 3514.069, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3514069.jpg'}], 'start': 2974.352, 'title': 'Optimizing schema and performance', 'summary': 'Discusses optimizations made to the schema such as reducing file name length, declaring fields as not null, and optimizing for writes at the expense of reads, resulting in more efficient storage and performance improvements. it also covers the challenges of testing performance changes, the use of a-b testing and stage rollouts, the impact of primary key changes on table performance, and the trade-offs made due to resource constraints.', 'chapters': [{'end': 3226.536, 'start': 2974.352, 'title': 'Optimizing schema for performance', 'summary': 'Discusses optimizations made to the schema, including reducing file name length, declaring fields as not null, and optimizing for writes at the expense of reads, resulting in more efficient storage and performance improvements.', 'duration': 252.184, 'highlights': ["File name length was changed from 260 to 255, resulting in more efficient storage due to MySQL's handling of char size, saving bytes per field. Reducing file name length from 260 to 255, optimizing storage efficiency, saving bytes per field.", 'Fields were declared not null, saving another byte per field and further optimizing storage. Declaring fields as not null, saving another byte per field, optimizing storage.', "Optimizing for writes at the expense of reads by reorganizing the primary key, resulting in performance improvements especially due to the system's high volume of writes. Reorganizing the primary key, optimizing for writes at the expense of reads, performance improvements due to high volume of writes."]}, {'end': 3604.694, 'start': 3230.118, 'title': 'Testing and implementing performance changes', 'summary': 'Discusses the challenges of testing performance changes, the use of a-b testing and stage rollouts, the impact of primary key changes on table performance, and the trade-offs made due to resource constraints.', 'duration': 374.576, 'highlights': ['The difficulty of testing performance changes and the importance of testing in production to ensure good precision and accuracy.', 'The use of A-B testing and stage rollouts to assess the impact of changes and improve operational ability incrementally.', 'The significant performance impact of primary key changes on table performance, and the advantage of being able to pivot on the fly with MySQL compared to other solutions.', 'The trade-offs made due to resource constraints and the need to use time effectively, highlighting the challenges of making decisions and trade-offs with limited resources.']}], 'duration': 630.342, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc2974352.jpg', 'highlights': ['Reducing file name length from 260 to 255, optimizing storage efficiency, saving bytes per field.', 'Fields were declared not null, saving another byte per field, optimizing storage.', "Optimizing for writes at the expense of reads by reorganizing the primary key, resulting in performance improvements especially due to the system's high volume of writes.", 'The difficulty of testing performance changes and the importance of testing in production to ensure good precision and accuracy.', 'The use of A-B testing and stage rollouts to assess the impact of changes and improve operational ability incrementally.', 'The significant performance impact of primary key changes on table performance, and the advantage of being able to pivot on the fly with MySQL compared to other solutions.', 'The trade-offs made due to resource constraints and the need to use time effectively, highlighting the challenges of making decisions and trade-offs with limited resources.']}, {'end': 4094.74, 'segs': [{'end': 3629.463, 'src': 'embed', 'start': 3605.715, 'weight': 0, 'content': [{'end': 3612.818, 'text': 'So maybe the next evolution of server file journal, rather than re-architecting it at a MySQL level,', 'start': 3605.715, 'duration': 7.103}, {'end': 3615.579, 'text': 'will just be we buy a whole bunch of SSDs and put it on that instead.', 'start': 3612.818, 'duration': 2.761}, {'end': 3621.661, 'text': 'If we can save months and months of engineer time by doing that, then maybe that makes sense.', 'start': 3616.099, 'duration': 5.562}, {'end': 3629.463, 'text': "So these are both things that we haven't done yet, but show the same kinds of decisions being made.", 'start': 3622.621, 'duration': 6.842}], 'summary': 'Consider using ssds for server file journal to save months of engineer time.', 'duration': 23.748, 'max_score': 3605.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3605715.jpg'}, {'end': 3696.259, 'src': 'embed', 'start': 3668.195, 'weight': 2, 'content': [{'end': 3670.901, 'text': 'So I mean we want to just.', 'start': 3668.195, 'duration': 2.706}, {'end': 3671.502, 'text': 'we always want to be.', 'start': 3670.901, 'duration': 0.601}, {'end': 3674.266, 'text': 'We want to appeal to more people.', 'start': 3672.365, 'duration': 1.901}, {'end': 3677.328, 'text': 'We want more people to be happy to be using us all the time.', 'start': 3674.787, 'duration': 2.541}, {'end': 3679.149, 'text': "That's always a goal.", 'start': 3678.389, 'duration': 0.76}, {'end': 3682.311, 'text': 'We can always have more people be using us.', 'start': 3679.209, 'duration': 3.102}, {'end': 3689.215, 'text': "But the ultimate goal is that you don't have to think about where your data is.", 'start': 3683.592, 'duration': 5.623}, {'end': 3696.259, 'text': "I shouldn't have had to think about the fact that the data was on this laptop for the presentation,", 'start': 3690.856, 'duration': 5.403}], 'summary': 'Goal: increase user base and ensure seamless data access and usage.', 'duration': 28.064, 'max_score': 3668.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3668195.jpg'}, {'end': 3745.619, 'src': 'embed', 'start': 3717.09, 'weight': 1, 'content': [{'end': 3721.392, 'text': 'but maybe at home on my TV I could just get them on there.', 'start': 3717.09, 'duration': 4.302}, {'end': 3728.071, 'text': "So this is how technology should work, but it doesn't currently.", 'start': 3723.649, 'duration': 4.422}, {'end': 3730.892, 'text': 'So we want to start building all this stuff out.', 'start': 3728.751, 'duration': 2.141}, {'end': 3739.376, 'text': 'In the near term, that means better mobile clients, more API usage, and stuff like that.', 'start': 3732.313, 'duration': 7.063}, {'end': 3745.619, 'text': 'You guys have anything that you want to add to that?', 'start': 3744.318, 'duration': 1.301}], 'summary': 'Improving technology for better mobile clients, increased api usage, and more.', 'duration': 28.529, 'max_score': 3717.09, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3717090.jpg'}, {'end': 3804.286, 'src': 'embed', 'start': 3765.265, 'weight': 5, 'content': [{'end': 3771.19, 'text': "but I'm personally at least happy that people have found a productive way to use Dropbox.", 'start': 3765.265, 'duration': 5.925}, {'end': 3793.024, 'text': 'How do you, I mean, to some extent, some of the services that have been shut down do exactly the same thing.', 'start': 3787.538, 'duration': 5.486}, {'end': 3799.971, 'text': 'How are you, as a business, defending against becoming sort of a transporter or whatever you want?', 'start': 3793.524, 'duration': 6.447}, {'end': 3804.286, 'text': "Yeah, so I mean I don't have all the information on that.", 'start': 3801.164, 'duration': 3.122}], 'summary': 'Dropbox pleased with productive usage. defending against becoming a transporter. limited information available.', 'duration': 39.021, 'max_score': 3765.265, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3765265.jpg'}, {'end': 3919.815, 'src': 'embed', 'start': 3896.25, 'weight': 3, 'content': [{'end': 3902.812, 'text': "And we're always using Dropbox to move the files between the different PowerPoint presentations and reports.", 'start': 3896.25, 'duration': 6.562}, {'end': 3907.026, 'text': "You've got business accounts, too, where you get a large amount of space.", 'start': 3903.203, 'duration': 3.823}, {'end': 3909.367, 'text': "Yeah I don't know what we.", 'start': 3907.126, 'duration': 2.241}, {'end': 3911.029, 'text': 'My account costs $100 a year.', 'start': 3909.367, 'duration': 1.662}, {'end': 3913.03, 'text': "I don't know how much room I have on it.", 'start': 3911.049, 'duration': 1.981}, {'end': 3919.815, 'text': "Yeah, we have an enterprise product that again I don't know a whole lot about what it offers,", 'start': 3914.071, 'duration': 5.744}], 'summary': 'Dropbox is used for file transfer and offers business accounts with large storage space, costing $100/year.', 'duration': 23.565, 'max_score': 3896.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3896250.jpg'}, {'end': 4041.185, 'src': 'embed', 'start': 3995.966, 'weight': 4, 'content': [{'end': 4004.191, 'text': "So we're seeing this change in requirements on the back end that rather than being latency tolerant, it's becoming, I guess,", 'start': 3995.966, 'duration': 8.225}, {'end': 4006.453, 'text': 'less latency tolerant over time.', 'start': 4004.191, 'duration': 2.262}, {'end': 4030.717, 'text': 'Who are the main competitors of your company? And how do you think about Box.com? Yeah, so box.net is currently in the enterprise space.', 'start': 4013.697, 'duration': 17.02}, {'end': 4041.185, 'text': 'And as a whole, our strategy is to build the best product that we can and best services that we can,', 'start': 4033.719, 'duration': 7.466}], 'summary': 'Company adapting to decreasing latency tolerance; focusing on product and services in enterprise space.', 'duration': 45.219, 'max_score': 3995.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3995966.jpg'}], 'start': 3605.715, 'title': 'Server file journal evolution and dropbox data service', 'summary': "Discusses the evolution towards using a large number of ssds for server file journaling, aiming to save months of engineer time, appeal to more users, improve mobile clients, and increase api usage. it also covers dropbox's data service, user subscriptions, enterprise product, and competition with box.net, highlighting the company's focus on building the best products and services.", 'chapters': [{'end': 3745.619, 'start': 3605.715, 'title': 'Next evolution of server file journal', 'summary': 'Highlights the potential shift towards using a large number of ssds for server file journaling to save months of engineer time, the goal of appealing to more users, and the focus on building better mobile clients and increasing api usage.', 'duration': 139.904, 'highlights': ['The potential shift towards using a large number of SSDs for server file journaling to save months of engineer time', 'The goal of appealing to more users and making more people happy to be using the service all the time', 'The focus on building better mobile clients and increasing API usage']}, {'end': 4094.74, 'start': 3748.331, 'title': 'Dropbox: data service and competition', 'summary': "Discusses dropbox's data service, user subscriptions, enterprise product, and competition with box.net in the enterprise space, while emphasizing the company's focus on building the best products and services.", 'duration': 346.409, 'highlights': ['Dropbox operates as a data service with almost all user subscriptions and offers an enterprise product with appealing features for the enterprise market. Dropbox primarily relies on user subscriptions and provides an enterprise product tailored for the enterprise market.', "The company aims to focus on building the best products and services, not getting distracted by competitors, with box.net being a key competitor in the enterprise space. Dropbox's strategy revolves around building superior products and services and acknowledges box.net as a competitor in the enterprise space.", "The interviewee mentions the consulting operation's use of Dropbox for file sharing across different locations in the United States, highlighting the practical business application of the service. Dropbox is utilized by a consulting operation for file sharing across multiple US locations, showcasing its practical business application."]}], 'duration': 489.025, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/PE4gwstWhmc/pics/PE4gwstWhmc3605715.jpg', 'highlights': ['The potential shift towards using a large number of SSDs for server file journaling to save months of engineer time', 'The focus on building better mobile clients and increasing API usage', 'The goal of appealing to more users and making more people happy to be using the service all the time', 'Dropbox operates as a data service with almost all user subscriptions and offers an enterprise product with appealing features for the enterprise market', 'The company aims to focus on building the best products and services, not getting distracted by competitors, with box.net being a key competitor in the enterprise space', "The interviewee mentions the consulting operation's use of Dropbox for file sharing across different locations in the United States, highlighting the practical business application of the service"]}], 'highlights': ["Dropbox's main sync product is used by tens of millions of people and syncs hundreds of millions of files daily.", 'The challenges of working on a fast-changing back end in a rapidly growing environment are discussed, emphasizing the need to adapt to increasing demands and limited resources.', 'The implementation of four megabyte chunks and SHA-256 hashes for file deduplication, along with the utilization of rsync diff for small modifications, showcases the efficient strategies for deduplication and potential double-digit percentage savings.', 'The transition from polling to long-polling with the NOT servers has significantly reduced the load on the servers, preventing crushes due to millions of clients.', 'The server file journal records all the edits made to the metadata of files in Dropbox, with changes being uploaded to the meta servers and logged in the journal.', "Optimizing for writes at the expense of reads by reorganizing the primary key, resulting in performance improvements especially due to the system's high volume of writes.", 'The potential shift towards using a large number of SSDs for server file journaling to save months of engineer time', 'The focus on building better mobile clients and increasing API usage', 'The company aims to focus on building the best products and services, not getting distracted by competitors, with box.net being a key competitor in the enterprise space']}