title

4. Hashing

description

MIT 6.006 Introduction to Algorithms, Spring 2020
Instructor: Jason Ku
View the complete course: https://ocw.mit.edu/6-006S20
YouTube Playlist: https://www.youtube.com/playlist?list=PLUl4u3cNGP63EdVPNLG3ToM6LaEUuStEY
Hashing allows for faster search and dynamic operations on data structures, arrays, and sorted arrays. This lecture discusses comparison models, decision trees, and hash functions.
License: Creative Commons BY-NC-SA
More information at https://ocw.mit.edu/terms
More courses at https://ocw.mit.edu
Support OCW at http://ow.ly/a1If50zVRlQ
We encourage constructive comments and discussion on OCW’s YouTube and other social media channels. Personal attacks, hate speech, trolling, and inappropriate comments are not allowed and may be removed. More details at https://ocw.mit.edu/comments.

detail

{'title': '4. Hashing', 'heatmap': [{'end': 2543.297, 'start': 2477.403, 'weight': 0.792}, {'end': 2764.407, 'start': 2690.092, 'weight': 1}], 'summary': "Covers the concept of hashing for set data structures, achieving log n time with n log n build overhead, discusses comparison model of computation and decision trees, explores direct access array data structure efficiency, limitations, space complexity, direct access arrays supporting up to the size of the largest key 'u', using hash functions to minimize space usage, challenges of collision resolution, and the concept of universal hash family for better performance.", 'chapters': [{'end': 292.4, 'segs': [{'end': 41.882, 'src': 'embed', 'start': 12.95, 'weight': 1, 'content': [{'end': 17.362, 'text': 'Welcome to the fourth lecture of 6.006.', 'start': 12.95, 'duration': 4.412}, {'end': 20.224, 'text': 'Today, we are going to be talking about hashing.', 'start': 17.362, 'duration': 2.862}, {'end': 35.036, 'text': 'Last lecture on Tuesday, Professor Solomon was talking about set data structures, storing things so that you can query items by their key,', 'start': 20.464, 'duration': 14.572}, {'end': 41.882, 'text': 'by what they intrinsically are, versus what Professor Domain was talking about last week, which was sequence data structures,', 'start': 35.036, 'duration': 6.846}], 'summary': 'Lecture 4 of 6.006 covers hashing and set data structures.', 'duration': 28.932, 'max_score': 12.95, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE12950.jpg'}, {'end': 164.236, 'src': 'embed', 'start': 117.557, 'weight': 0, 'content': [{'end': 123.86, 'text': "So the question then becomes, can I build that data structure faster? That'll be a subject of next week's Thursday lecture.", 'start': 117.557, 'duration': 6.303}, {'end': 129.223, 'text': "But this week, we're going to concentrate on this static find.", 'start': 124.621, 'duration': 4.602}, {'end': 133.786, 'text': 'We got log n, which is an exponential improvement over linear.', 'start': 129.523, 'duration': 4.263}, {'end': 145.252, 'text': "But the question here now becomes, can I do faster than log n time? And what we're going to do at the first part of this lecture is show you that no.", 'start': 136.067, 'duration': 9.185}, {'end': 151.709, 'text': "What's up? No? OK.", 'start': 147.386, 'duration': 4.323}, {'end': 164.236, 'text': "That you can't do faster than log n time in the caveat that we are in a slightly more restricted model of computation than what we introduced to you a couple of weeks ago.", 'start': 151.989, 'duration': 12.247}], 'summary': 'Next week: data structure speed. this week: static find, log n time limit.', 'duration': 46.679, 'max_score': 117.557, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE117557.jpg'}, {'end': 269.166, 'src': 'embed', 'start': 240.949, 'weight': 3, 'content': [{'end': 243.25, 'text': 'And we also want to improve this insert delete.', 'start': 240.949, 'duration': 2.301}, {'end': 251.632, 'text': 'We want to make this data structure dynamic because we might do those operations quite a bit.', 'start': 243.29, 'duration': 8.342}, {'end': 254.713, 'text': 'And so this lecture is about optimizing those three things.', 'start': 251.752, 'duration': 2.961}, {'end': 261.095, 'text': "So first, I'm going to show you that we can't do faster than log n for find.", 'start': 256.153, 'duration': 4.942}, {'end': 262.843, 'text': 'which is a little weird.', 'start': 261.942, 'duration': 0.901}, {'end': 265.805, 'text': 'OK, the model of computation.', 'start': 263.804, 'duration': 2.001}, {'end': 269.166, 'text': "I'm going to be proving this lower bound on.", 'start': 265.805, 'duration': 3.361}], 'summary': 'Lecture focuses on optimizing insert, delete, and find operations, proving a lower bound of log n for find.', 'duration': 28.217, 'max_score': 240.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE240949.jpg'}], 'start': 12.95, 'title': 'Hashing and data structure optimization', 'summary': 'Covers the concept of hashing for set data structures, including implementing set interface using unsorted array and a better data structure for faster find operations, achieving log n time with n log n build overhead. it also discusses the limitations of improving find operations, the need for dynamic insert and delete operations, and the proof that the find operation cannot be faster than logarithmic time.', 'chapters': [{'end': 193.272, 'start': 12.95, 'title': 'Lecture 4: hashing', 'summary': 'Covers the concept of hashing for set data structures, including implementing set interface using unsorted array and a better data structure for faster find operations, achieving log n time with n log n build overhead, and exploring the limitations and potential improvements beyond log n time in a restricted model of computation.', 'duration': 180.322, 'highlights': ['The chapter covers the concept of hashing for set data structures. The lecture discusses the concept of hashing for set data structures, focusing on efficient ways to query items by their key.', 'A better data structure for faster find operations is introduced, achieving log n time with n log n build overhead. The lecture presents a more efficient data structure for faster find operations, achieving a time complexity of log n with a build overhead of n log n.', 'Exploring the limitations and potential improvements beyond log n time in a restricted model of computation. The lecture explores the limitations of achieving faster than log n time in a restricted model of computation, while also discussing potential improvements beyond this time complexity.']}, {'end': 292.4, 'start': 193.492, 'title': 'Optimizing data structures lecture', 'summary': 'Discusses the limitations of improving find operations, the need for dynamic insert and delete operations, and the proof that the find operation cannot be faster than logarithmic time.', 'duration': 98.908, 'highlights': ['The lecture focuses on the limitations of improving find operations, the need for dynamic insert and delete operations, and the proof that the find operation cannot be faster than logarithmic time.', 'The data structure lecture aims to optimize find, insert, and delete operations, which are crucial for efficient data management.', 'The chapter explains the impossibility of achieving a faster than logarithmic time complexity for the find operation, emphasizing the fundamental limitation in this aspect.']}], 'duration': 279.45, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE12950.jpg', 'highlights': ['A better data structure for faster find operations is introduced, achieving log n time with n log n build overhead.', 'The lecture discusses the concept of hashing for set data structures, focusing on efficient ways to query items by their key.', 'The lecture explores the limitations of achieving faster than log n time in a restricted model of computation, while also discussing potential improvements beyond this time complexity.', 'The data structure lecture aims to optimize find, insert, and delete operations, which are crucial for efficient data management.', 'The chapter explains the impossibility of achieving a faster than logarithmic time complexity for the find operation, emphasizing the fundamental limitation in this aspect.']}, {'end': 890.309, 'segs': [{'end': 326.422, 'src': 'embed', 'start': 292.64, 'weight': 3, 'content': [{'end': 300.381, 'text': "And the model of computation that's weaker than what we've been talking about previously is what I'm going to call the comparison model.", 'start': 292.64, 'duration': 7.741}, {'end': 311.808, 'text': "And a comparison model means is that the items, the objects I'm storing, I can kind of think of them as black boxes.", 'start': 304.322, 'duration': 7.486}, {'end': 326.422, 'text': "I don't get to touch these things, except the only way that I can distinguish between them is to say given a key and an item or two items,", 'start': 312.348, 'duration': 14.074}], 'summary': 'Comparison model involves treating items as black boxes, distinguished by given key and items.', 'duration': 33.782, 'max_score': 292.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE292640.jpg'}, {'end': 371.216, 'src': 'embed', 'start': 346.766, 'weight': 2, 'content': [{'end': 353.77, 'text': 'And actually, all of the search algorithms that we saw on Tuesday were comparison sort algorithms.', 'start': 346.766, 'duration': 7.004}, {'end': 356.491, 'text': 'What you did was, You stepped through the program.', 'start': 353.81, 'duration': 2.681}, {'end': 359.192, 'text': 'At some point, you came to a branch.', 'start': 356.992, 'duration': 2.2}, {'end': 361.233, 'text': 'And you looked at two keys.', 'start': 359.272, 'duration': 1.961}, {'end': 365.234, 'text': 'And you branched based on whether one key was bigger than another.', 'start': 361.733, 'duration': 3.501}, {'end': 367.315, 'text': 'That was a comparison.', 'start': 365.914, 'duration': 1.401}, {'end': 369.195, 'text': 'And then you moved some stuff around.', 'start': 367.615, 'duration': 1.58}, {'end': 371.216, 'text': 'But that was the general paradigm.', 'start': 369.255, 'duration': 1.961}], 'summary': 'The search algorithms discussed were all comparison sort algorithms, involving branching and key comparison.', 'duration': 24.45, 'max_score': 346.766, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE346766.jpg'}, {'end': 647.347, 'src': 'embed', 'start': 613.862, 'weight': 4, 'content': [{'end': 619.18, 'text': "Why do I need one more output? if it's not in there.", 'start': 613.862, 'duration': 5.318}, {'end': 634.428, 'text': "So any correct comparison searching algorithm, I'm doing some comparisons to find this thing, needs to have at least n plus 1 leaves.", 'start': 620.64, 'duration': 13.788}, {'end': 647.347, 'text': "Right?. Otherwise it can't be correct, because I could look up the one that I'm not returning in that set and it would never be able to return that value.", 'start': 636.949, 'duration': 10.398}], 'summary': 'Correct comparison searching algorithm requires at least n+1 leaves for accuracy.', 'duration': 33.485, 'max_score': 613.862, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE613862.jpg'}, {'end': 812.732, 'src': 'embed', 'start': 772.427, 'weight': 0, 'content': [{'end': 773.687, 'text': "But it's log n.", 'start': 772.427, 'duration': 1.26}, {'end': 777.888, 'text': 'The best you can do is if this is a balanced binary tree.', 'start': 773.687, 'duration': 4.201}, {'end': 796.909, 'text': 'So the min height is going to be at least log n height, where the min height is logarithmic.', 'start': 777.988, 'duration': 18.921}, {'end': 799.009, 'text': "So it's actually theta right here.", 'start': 797.409, 'duration': 1.6}, {'end': 803.97, 'text': 'But if I just said height here, I would be lower bounding the height.', 'start': 799.769, 'duration': 4.201}, {'end': 812.732, 'text': 'I could have a linear height if I just chained comparisons down one by one if I was doing a linear search, for example.', 'start': 804.65, 'duration': 8.082}], 'summary': 'In a balanced binary tree, the minimum height is log n, representing logarithmic complexity.', 'duration': 40.305, 'max_score': 772.427, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE772427.jpg'}, {'end': 860.676, 'src': 'embed', 'start': 836.088, 'weight': 1, 'content': [{'end': 842.891, 'text': 'that allows me to do faster, which allows me to do something stronger than comparisons.', 'start': 836.088, 'duration': 6.803}, {'end': 846.252, 'text': 'Comparisons have a constant branching factor.', 'start': 843.311, 'duration': 2.941}, {'end': 856.095, 'text': 'In particular, if I do this operation, this constant time operation, I can branch to two different locations.', 'start': 846.732, 'duration': 9.363}, {'end': 860.676, 'text': "It's like an if kind of situation, if or else.", 'start': 856.555, 'duration': 4.121}], 'summary': 'Faster operations enable stronger comparisons with constant branching factor.', 'duration': 24.588, 'max_score': 836.088, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE836088.jpg'}], 'start': 292.64, 'title': 'Comparison model and algorithm decision tree', 'summary': 'Discusses the comparison model of computation, treating objects as black boxes and using key operations, as well as representing algorithms as decision trees based on comparisons with a minimum height of logarithmic time and non-constant branching for faster computation.', 'chapters': [{'end': 403.33, 'start': 292.64, 'title': 'Comparison model of computation', 'summary': 'Discusses the comparison model of computation, where objects are treated as black boxes and can only be compared using key operations, as seen in search algorithms and sorting operations.', 'duration': 110.69, 'highlights': ['The comparison model of computation involves treating objects as black boxes and only allows comparison operations such as equality, less than, and greater than, with only two possible outputs for each comparator.', 'All search and sorting algorithms discussed previously operate within the comparison model, using comparison sort algorithms that involve branching based on key comparisons.', 'The comparison model restricts operations to only comparing keys and does not allow direct access to the contents of the objects being compared.']}, {'end': 890.309, 'start': 403.59, 'title': 'Algorithm decision tree and comparison model', 'summary': 'Explains the concept of representing an algorithm as a decision tree based on comparisons, with a minimum height of logarithmic time for finding a key in a set and the need for branching a non-constant amount for faster computation.', 'duration': 486.719, 'highlights': ['The number of comparisons in an execution of the algorithm is just along a path from the root to a leaf, requiring at least n plus 1 leaves for a correct comparison searching algorithm. The number of comparisons in an execution is directly related to the path from the root to a leaf, necessitating at least n plus 1 leaves for correctness.', 'The minimum height of any binary tree with at least n plus 1 leaves is logarithmic, indicating that in the worst case, the algorithm has to do as many comparisons as the longest root-to-leaf path, which is equivalent to the height of the tree. The minimum height of a binary tree with n plus 1 leaves is logarithmic, implying that the algorithm needs to do comparisons equivalent to the longest root-to-leaf path, i.e., the height of the tree.', 'In order to achieve faster computation than logarithmic time, the need to branch a non-constant amount arises, as comparisons have a constant branching factor, and branching by a non-constant amount is required to reduce the time complexity. To attain faster computation than logarithmic time, the necessity to branch a non-constant amount arises, as comparisons have a constant branching factor, demanding branching by a non-constant amount to lower the time complexity.']}], 'duration': 597.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE292640.jpg', 'highlights': ['The minimum height of any binary tree with at least n plus 1 leaves is logarithmic, indicating that in the worst case, the algorithm has to do as many comparisons as the longest root-to-leaf path, which is equivalent to the height of the tree.', 'In order to achieve faster computation than logarithmic time, the need to branch a non-constant amount arises, as comparisons have a constant branching factor, and branching by a non-constant amount is required to reduce the time complexity.', 'All search and sorting algorithms discussed previously operate within the comparison model, using comparison sort algorithms that involve branching based on key comparisons.', 'The comparison model of computation involves treating objects as black boxes and only allows comparison operations such as equality, less than, and greater than, with only two possible outputs for each comparator.', 'The number of comparisons in an execution of the algorithm is just along a path from the root to a leaf, requiring at least n plus 1 leaves for a correct comparison searching algorithm.', 'The comparison model restricts operations to only comparing keys and does not allow direct access to the contents of the objects being compared.']}, {'end': 1234.204, 'segs': [{'end': 922.016, 'src': 'embed', 'start': 891.93, 'weight': 1, 'content': [{'end': 904.23, 'text': 'We had this really neat operation in the random access machine that we could randomly go to any place in memory in constant time based on a number.', 'start': 891.93, 'duration': 12.3}, {'end': 914.448, 'text': 'That was a super powerful thing, because within a single constant time operation, I could go to any space in memory.', 'start': 908.322, 'duration': 6.126}, {'end': 922.016, 'text': "That's potentially much larger than linear branching factor, depending on the size of my model and the size of my machine.", 'start': 915.329, 'duration': 6.687}], 'summary': 'Random access machine allows constant time memory access for large models and machines.', 'duration': 30.086, 'max_score': 891.93, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE891930.jpg'}, {'end': 1046.01, 'src': 'embed', 'start': 989.815, 'weight': 0, 'content': [{'end': 996.202, 'text': "It's really no different than the arrays that we've been talking about earlier in the class.", 'start': 989.815, 'duration': 6.387}, {'end': 999.825, 'text': "We've got an array.", 'start': 996.222, 'duration': 3.603}, {'end': 1009.775, 'text': "And if I have an item here with key equals 10, I'll stick it here in the 10th place.", 'start': 999.845, 'duration': 9.93}, {'end': 1012.478, 'text': 'Now, I can only now store one item.', 'start': 1010.156, 'duration': 2.322}, {'end': 1016.888, 'text': 'with the key 10 in my thing.', 'start': 1014.386, 'duration': 2.502}, {'end': 1020.431, 'text': "And that's one of the stipulations we had on our set data structures.", 'start': 1017.369, 'duration': 3.062}, {'end': 1025.576, 'text': "If we tried to insert something with the same key as something already stored there, we're going to replace the item.", 'start': 1020.511, 'duration': 5.065}, {'end': 1028.898, 'text': "That's what the semantics of our set interface was.", 'start': 1026.377, 'duration': 2.521}, {'end': 1030.18, 'text': "But that's OK.", 'start': 1029.579, 'duration': 0.601}, {'end': 1033.683, 'text': "That's satisfying the conditions of our set interface.", 'start': 1030.2, 'duration': 3.483}, {'end': 1037.626, 'text': "So if we store it there, that's fantastic.", 'start': 1034.824, 'duration': 2.802}, {'end': 1044.309, 'text': 'How long does it take to find if we have an item with the key 10? It takes constant time.', 'start': 1038.027, 'duration': 6.282}, {'end': 1046.01, 'text': 'Worst case, great.', 'start': 1044.848, 'duration': 1.162}], 'summary': 'Arrays with key-value pairs, constant time to find item with key 10', 'duration': 56.195, 'max_score': 989.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE989815.jpg'}, {'end': 1112.709, 'src': 'embed', 'start': 1076.968, 'weight': 4, 'content': [{'end': 1086.012, 'text': "So let's say I'm storing, I don't know, a number associated with the 300 or 400 of you that are in this classroom.", 'start': 1076.968, 'duration': 9.044}, {'end': 1089.874, 'text': "But I'm storing your MIT IDs.", 'start': 1088.353, 'duration': 1.521}, {'end': 1096.017, 'text': 'How big are those numbers? Those are like nine-digit numbers, pretty long numbers.', 'start': 1090.394, 'duration': 5.623}, {'end': 1104.143, 'text': 'So what I would need to do if I were storing your keys as MIT IDs,', 'start': 1097.338, 'duration': 6.805}, {'end': 1112.709, 'text': 'I would need an array that has indices that span the entire space of nine-digit numbers.', 'start': 1104.143, 'duration': 8.566}], 'summary': 'Storing mit ids as nine-digit numbers for 300-400 students.', 'duration': 35.741, 'max_score': 1076.968, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1076968.jpg'}, {'end': 1175.075, 'src': 'embed', 'start': 1143.927, 'weight': 3, 'content': [{'end': 1148.669, 'text': "And what I'm going to use as a variable to talk about the size of the numbers I'm storing,", 'start': 1143.927, 'duration': 4.742}, {'end': 1152.831, 'text': "I'm going to say u is the maximum size of any number that I'm storing.", 'start': 1148.669, 'duration': 4.162}, {'end': 1157.813, 'text': "That's the size of the universe of space of keys that I'm storing.", 'start': 1153.791, 'duration': 4.022}, {'end': 1166.022, 'text': 'Does that make sense? OK, so to instantiate a direct access array of that size, I have to allocate that amount of space.', 'start': 1157.873, 'duration': 8.149}, {'end': 1175.075, 'text': "And so if that is much bigger than n, then I'm kind of screwed because I'm using much more space.", 'start': 1166.964, 'duration': 8.111}], 'summary': "Using 'u' as the maximum size of stored numbers, allocating space according to 'u' could lead to excessive usage.", 'duration': 31.148, 'max_score': 1143.927, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1143927.jpg'}, {'end': 1240.328, 'src': 'embed', 'start': 1214.811, 'weight': 6, 'content': [{'end': 1223.656, 'text': 'you have code that can take a set data structure and implement a sequence data structure and take a sequence data structure and implement a set data structure.', 'start': 1214.811, 'duration': 8.845}, {'end': 1226.719, 'text': "they just won't necessarily have very good runtime.", 'start': 1224.778, 'duration': 1.941}, {'end': 1234.204, 'text': 'So this direct access array semantics is really just good for these specific set operations.', 'start': 1226.799, 'duration': 7.405}, {'end': 1240.328, 'text': "Does that make sense? Yeah? What is u? JASON KU- U is the size of the largest key that I'm allowed to store.", 'start': 1234.224, 'duration': 6.104}], 'summary': 'Code can transform set to sequence and vice versa, with limited runtime. direct array access ideal for set operations.', 'duration': 25.517, 'max_score': 1214.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1214811.jpg'}], 'start': 891.93, 'title': 'Direct access array and space complexity', 'summary': 'Discusses the direct access array data structure for constant time operations, emphasizing its efficiency and limitations in handling a range of keys. it also explores the space complexity issue when the universe of keys is larger than the stored elements, providing a comparative analysis.', 'chapters': [{'end': 1076.948, 'start': 891.93, 'title': 'Direct access array data structure', 'summary': 'Discusses the direct access array as a powerful data structure for constant time operations in finding, inserting, and deleting items based on a key, offering efficient performance but limited by the range of keys.', 'duration': 185.018, 'highlights': ['Direct access array allows constant time operations for finding, inserting, and deleting items based on a key, satisfying the conditions of the set interface.', 'The operation is limited by the range of keys, as it requires knowing the highest possible key value to allocate memory efficiently.', "The concept is simpler than hashing and relies on storing items at specific indices in an array, enabling quick access based on the item's key value.", 'The direct access array operation is super powerful, offering constant time access to any space in memory, potentially larger than a linear branching factor, depending on the size of the model and machine.']}, {'end': 1234.204, 'start': 1076.968, 'title': 'Direct access array and space complexity', 'summary': 'Explains the concept of direct access array, highlighting the challenge of space allocation when the size of the universe of keys is much larger than the number of elements to be stored, with a comparison between the size of the universe of keys and the amount of space required for the direct access array.', 'duration': 157.236, 'highlights': ['The size of a direct access array needed for storing MIT IDs, which are nine-digit numbers, would be 10^9, far exceeding the actual number of individuals in the classroom.', "The challenge arises when the maximum size of any number being stored, represented as 'u', is much larger than the number of elements to be stored, leading to inefficient space allocation.", 'A direct access array is suitable for specific set operations, but not efficient for implementing sequence data structures.']}], 'duration': 342.274, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE891930.jpg', 'highlights': ['Direct access array allows constant time operations for finding, inserting, and deleting items based on a key, satisfying the conditions of the set interface.', 'The direct access array operation is super powerful, offering constant time access to any space in memory, potentially larger than a linear branching factor, depending on the size of the model and machine.', "The concept is simpler than hashing and relies on storing items at specific indices in an array, enabling quick access based on the item's key value.", 'The operation is limited by the range of keys, as it requires knowing the highest possible key value to allocate memory efficiently.', 'The size of a direct access array needed for storing MIT IDs, which are nine-digit numbers, would be 10^9, far exceeding the actual number of individuals in the classroom.', "The challenge arises when the maximum size of any number being stored, represented as 'u', is much larger than the number of elements to be stored, leading to inefficient space allocation.", 'A direct access array is suitable for specific set operations, but not efficient for implementing sequence data structures.']}, {'end': 1745.54, 'segs': [{'end': 1266.69, 'src': 'embed', 'start': 1234.224, 'weight': 1, 'content': [{'end': 1240.328, 'text': "Does that make sense? Yeah? What is u? JASON KU- U is the size of the largest key that I'm allowed to store.", 'start': 1234.224, 'duration': 6.104}, {'end': 1247.252, 'text': 'Does that make sense? The direct access array is supporting up to u size keys.', 'start': 1240.348, 'duration': 6.904}, {'end': 1250.574, 'text': "Does that make sense? OK, we're going to move on for a second.", 'start': 1247.272, 'duration': 3.302}, {'end': 1266.69, 'text': "So that's the problem, right? Largest key, we're assuming integers here, integer keys.", 'start': 1250.955, 'duration': 15.735}], 'summary': 'Jason ku is the size of the largest key allowed to store, supporting up to u size keys.', 'duration': 32.466, 'max_score': 1234.224, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1234224.jpg'}, {'end': 1373.588, 'src': 'embed', 'start': 1323.241, 'weight': 0, 'content': [{'end': 1327.262, 'text': "That's how many 2 to the w is the number of addresses I can access.", 'start': 1323.241, 'duration': 4.021}, {'end': 1333.724, 'text': "So, implicitly, if I'm going to be able to use this direct access array, I need to make sure that the?", 'start': 1327.582, 'duration': 6.142}, {'end': 1341.607, 'text': 'u is less than 2 to the w if I want these operations to run in constant time.', 'start': 1333.724, 'duration': 7.883}, {'end': 1347.089, 'text': "If I have keys that are much larger than this, I'm going to need to do something else.", 'start': 1341.627, 'duration': 5.462}, {'end': 1350.352, 'text': 'OK, but this is kind of the assumption.', 'start': 1347.95, 'duration': 2.402}, {'end': 1351.813, 'text': 'In this class.', 'start': 1350.672, 'duration': 1.141}, {'end': 1359.578, 'text': 'when we give you an array of integers or an array of strings or something like that on your problem set or on an exam, the assumption is,', 'start': 1351.813, 'duration': 7.765}, {'end': 1369.445, 'text': 'unless we give you bounds on the size of those things, like the number of characters in your string or the size of the number,', 'start': 1359.578, 'duration': 9.867}, {'end': 1373.588, 'text': 'you can assume that those things will fit in one word of memory.', 'start': 1369.445, 'duration': 4.143}], 'summary': 'To access addresses, u must be less than 2 to the w for constant time operations.', 'duration': 50.347, 'max_score': 1323.241, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1323241.jpg'}, {'end': 1480.96, 'src': 'embed', 'start': 1449.085, 'weight': 4, 'content': [{'end': 1453.147, 'text': "But there's another trick that I'm looking for right now.", 'start': 1449.085, 'duration': 4.062}, {'end': 1458.771, 'text': 'We have a lot of space that we would need to allocate for this data structure.', 'start': 1453.327, 'duration': 5.444}, {'end': 1469.156, 'text': "What's an alternative? Instead of allocating a lot of space, we allocate less space.", 'start': 1459.751, 'duration': 9.405}, {'end': 1471.838, 'text': "Let's allocate less space.", 'start': 1470.697, 'duration': 1.141}, {'end': 1480.96, 'text': 'All right, so we have a really like, This is our space of keys, u.', 'start': 1471.918, 'duration': 9.042}], 'summary': 'Exploring alternative to allocate less space for data structure.', 'duration': 31.875, 'max_score': 1449.085, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1449085.jpg'}, {'end': 1655.297, 'src': 'embed', 'start': 1628.117, 'weight': 3, 'content': [{'end': 1635.963, 'text': "So if I choose a bad function here, then I'll have to store n things at the same index location.", 'start': 1628.117, 'duration': 7.846}, {'end': 1641.948, 'text': "And if I go there, I have to check to see whether any of those are the things that I'm looking for.", 'start': 1636.003, 'duration': 5.945}, {'end': 1643.228, 'text': "I haven't gained anything.", 'start': 1641.968, 'duration': 1.26}, {'end': 1649.253, 'text': 'I really want a hash function that will evenly distribute keys over this space.', 'start': 1643.769, 'duration': 5.484}, {'end': 1655.297, 'text': 'Does that make sense? But we have a problem here.', 'start': 1649.273, 'duration': 6.024}], 'summary': 'Seeking a hash function for even key distribution over space, facing a problem.', 'duration': 27.18, 'max_score': 1628.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1628117.jpg'}], 'start': 1234.224, 'title': 'Direct access arrays and hash functions', 'summary': "Discusses direct access arrays supporting up to the size of the largest key 'u' and limitations to storing only integer keys, along with the use of hash functions to minimize space usage, handle collisions, and evenly distribute keys over the space.", 'chapters': [{'end': 1290.751, 'start': 1234.224, 'title': 'Direct access array and integer keys', 'summary': "Discusses the concept of direct access array supporting up to the size of the largest key 'u' and the limitation to storing only integer keys for addressing, making an assumption on the input data.", 'duration': 56.527, 'highlights': ["The direct access array supports up to the size of the largest key 'u'.", 'The limitation to storing only integer keys for addressing is made, assuming inputs can only store integers.']}, {'end': 1745.54, 'start': 1291.632, 'title': 'Hash functions and direct access arrays', 'summary': 'Discusses the use of hash functions and direct access arrays to minimize space usage and handle collisions, emphasizing the need for a hash function that evenly distributes keys over the space.', 'duration': 453.908, 'highlights': ['The number of addresses that can be accessed is 2 to the power of the word size of the machine, implying that for operations to run in constant time, the key size should be less than 2 to the power of the word size.', 'The assumption in the class is that unless bounds are given, data will fit in one word of memory, defined by the word size of the machine.', 'To address the problem of using too much space with a large universe of keys, the suggestion is to store multiple values at each place in the direct access array, but this does not resolve the space usage issue.', 'An alternative to allocating a lot of space is to allocate less space by mapping the large space of keys to a smaller range using a function, but this could lead to multiple keys mapping to the same index location, requiring handling of collisions.', 'The need for a hash function that evenly distributes keys over the space is emphasized, as a bad function could result in having to store multiple items at the same index location, negating the benefits of the direct access array.', 'The dilemma of handling collisions in a direct access array involves considering whether to use linked lists, which are not efficient for direct access by index, thus contradicting the purpose of the direct access array.']}], 'duration': 511.316, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1234224.jpg', 'highlights': ['The number of addresses that can be accessed is 2 to the power of the word size of the machine, implying that for operations to run in constant time, the key size should be less than 2 to the power of the word size.', "The direct access array supports up to the size of the largest key 'u'.", 'The assumption in the class is that unless bounds are given, data will fit in one word of memory, defined by the word size of the machine.', 'The need for a hash function that evenly distributes keys over the space is emphasized, as a bad function could result in having to store multiple items at the same index location, negating the benefits of the direct access array.', 'An alternative to allocating a lot of space is to allocate less space by mapping the large space of keys to a smaller range using a function, but this could lead to multiple keys mapping to the same index location, requiring handling of collisions.']}, {'end': 2026.313, 'segs': [{'end': 1871.41, 'src': 'embed', 'start': 1779.475, 'weight': 1, 'content': [{'end': 1784.022, 'text': "Maybe if I choose m to be larger than n, there's going to be extra space in here.", 'start': 1779.475, 'duration': 4.547}, {'end': 1787.507, 'text': "I'll just stick it somewhere else in the existing array.", 'start': 1785.243, 'duration': 2.264}, {'end': 1792.356, 'text': 'How I find an open space is a little complicated.', 'start': 1789.514, 'duration': 2.842}, {'end': 1804.164, 'text': "But this is a technique called open addressing, which is much more common than the technique we're going to be talking about today in implementations.", 'start': 1792.657, 'duration': 11.507}, {'end': 1810.928, 'text': 'Python uses an open addressing scheme, which is essentially find another place in the array to put this collision.', 'start': 1804.244, 'duration': 6.684}, {'end': 1816.532, 'text': "But open addressing is notoriously difficult to analyze, so we're not going to do that in this class.", 'start': 1811.849, 'duration': 4.683}, {'end': 1822.156, 'text': "There's a much easier technique that we have an implementation for you in the recitation handouts.", 'start': 1816.552, 'duration': 5.604}, {'end': 1829.9, 'text': "It's what your colleague up here I can't find him over there was saying was,", 'start': 1823.436, 'duration': 6.464}, {'end': 1837.345, 'text': 'instead of storing it somewhere else in the existing direct access array down here, which we usually call the hash table,', 'start': 1829.9, 'duration': 7.445}, {'end': 1850.164, 'text': "Instead of storing it somewhere else in that hash table, we'll instead, at that key, store a pointer to another data structure,", 'start': 1841.195, 'duration': 8.969}, {'end': 1857.151, 'text': 'some other data structure that can store a bunch of things, just like any sequence data structure, like a dynamic array or a linked list or anything.', 'start': 1850.164, 'duration': 6.987}, {'end': 1860.275, 'text': 'All I need to do is be able to stick a bunch of things on there.', 'start': 1857.692, 'duration': 2.583}, {'end': 1863.365, 'text': 'when there are collisions.', 'start': 1862.244, 'duration': 1.121}, {'end': 1871.41, 'text': "And then when I go up to look for that thing, I'll just look through all of the things in that data structure and see if my key exists.", 'start': 1863.545, 'duration': 7.865}], 'summary': 'Open addressing and chaining are techniques for handling collisions in hash tables, with open addressing being more common but difficult to analyze. chaining involves using a pointer to another data structure to store colliding items.', 'duration': 91.935, 'max_score': 1779.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1779475.jpg'}, {'end': 1971.094, 'src': 'embed', 'start': 1944.279, 'weight': 0, 'content': [{'end': 1949.022, 'text': 'You can put a sorted array there to make it easier to check whether the key is there.', 'start': 1944.279, 'duration': 4.743}, {'end': 1950.363, 'text': 'You can put anything you want there.', 'start': 1949.222, 'duration': 1.141}, {'end': 1964.17, 'text': "The point of this lecture is going to try to show that there's a choice of hash function I can make that makes sure that these chains are small so that it really doesn't matter how I store them there.", 'start': 1950.703, 'duration': 13.467}, {'end': 1971.094, 'text': "Because if there's a constant number of things stored there, I can just look at all of them and do whatever I want.", 'start': 1965.211, 'duration': 5.883}], 'summary': 'Hash function choice minimizes chain size for efficient storage and retrieval.', 'duration': 26.815, 'max_score': 1944.279, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1944279.jpg'}], 'start': 1747.661, 'title': 'Hash functions and efficient storage', 'summary': 'Discusses challenges of collision resolution in hash functions, including open addressing, and alternative solutions. it also covers using a hash table to store pointers to additional data structures for efficient collision handling and the concept of chaining in hash tables.', 'chapters': [{'end': 1829.9, 'start': 1747.661, 'title': 'Hash functions and collision resolution', 'summary': 'Discusses the challenges of collision resolution in hash functions, including open addressing and its difficulty in analysis, and alternative solutions for resolving collisions.', 'duration': 82.239, 'highlights': ['Open addressing is a common technique used in Python for resolving collisions in hash functions, but it is notoriously difficult to analyze.', 'Choosing a larger size for the hash table than the number of elements can lead to extra space, which can be utilized for collision resolution by finding another place in the array to put the collision.', 'The chapter introduces the concept of open addressing as a technique for resolving collisions in hash functions, which is more common but difficult to analyze.']}, {'end': 1895.072, 'start': 1829.9, 'title': 'Data structure for efficient storage', 'summary': 'Discusses using a hash table to store pointers to additional data structures, such as dynamic arrays or linked lists, in order to efficiently handle collisions and ensure short chains for better performance.', 'duration': 65.172, 'highlights': ['Storing pointers to additional data structures in a hash table to handle collisions and ensure efficient storage.', 'Emphasizing the need for short chains in the additional data structures to optimize performance.', 'Considering the use of dynamic arrays or linked lists as potential additional data structures for efficient storage.']}, {'end': 2026.313, 'start': 1895.712, 'title': 'Hash table chaining', 'summary': 'Discusses the concept of chaining in hash tables, explaining the use of different data structures to store hashed values, emphasizing the importance of choosing an appropriate hash function to ensure small chains and constant time lookups.', 'duration': 130.601, 'highlights': ['The lecture explains the concept of chaining in hash tables, showcasing the utilization of various data structures like linked lists, dynamic arrays, and sorted arrays to store hashed values.', 'The importance of selecting an effective hash function to ensure small chains is emphasized, enabling constant time lookups regardless of how the data is stored.', 'The discussion also touches on the trade-offs involved in choosing the appropriate data structure for chaining, highlighting implementation details and associated overhead.']}], 'duration': 278.652, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE1747661.jpg', 'highlights': ['The lecture explains the concept of chaining in hash tables, showcasing the utilization of various data structures like linked lists, dynamic arrays, and sorted arrays to store hashed values.', 'Considering the use of dynamic arrays or linked lists as potential additional data structures for efficient storage.', 'Choosing a larger size for the hash table than the number of elements can lead to extra space, which can be utilized for collision resolution by finding another place in the array to put the collision.', 'The importance of selecting an effective hash function to ensure small chains is emphasized, enabling constant time lookups regardless of how the data is stored.', 'Storing pointers to additional data structures in a hash table to handle collisions and ensure efficient storage.', 'Open addressing is a common technique used in Python for resolving collisions in hash functions, but it is notoriously difficult to analyze.']}, {'end': 3167.627, 'segs': [{'end': 2081.634, 'src': 'embed', 'start': 2057.435, 'weight': 1, 'content': [{'end': 2067.103, 'text': 'then there is the possibility that I, for some input, all of the things in my set go directly to the same hashed index value.', 'start': 2057.435, 'duration': 9.668}, {'end': 2068.524, 'text': "So that ain't great.", 'start': 2067.763, 'duration': 0.761}, {'end': 2070.245, 'text': "Let's ignore that for a second.", 'start': 2069.063, 'duration': 1.182}, {'end': 2079.572, 'text': "What's the easiest way to get down from this large space of keys down to a small one? What's the easiest thing you could do? Yeah? Modulus.", 'start': 2070.565, 'duration': 9.007}, {'end': 2081.634, 'text': 'Great This is called the division method.', 'start': 2080.033, 'duration': 1.601}], 'summary': 'Possibility of all set items hashing to same index. easiest way down is division method.', 'duration': 24.199, 'max_score': 2057.435, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2057435.jpg'}, {'end': 2285.589, 'src': 'embed', 'start': 2256.102, 'weight': 2, 'content': [{'end': 2264.919, 'text': "Does that make sense? In some sense, I want But it's impossible for me to pick a fixed hash function that will achieve this.", 'start': 2256.102, 'duration': 8.817}, {'end': 2270.023, 'text': 'Because I just told you that if u is large, this is u.', 'start': 2265.239, 'duration': 4.784}, {'end': 2275.247, 'text': 'If u is large, then there exists inputs that map everything to one place.', 'start': 2270.023, 'duration': 5.224}, {'end': 2278.689, 'text': "So I feel I'm screwed.", 'start': 2275.267, 'duration': 3.422}, {'end': 2280.471, 'text': "There's no way to solve this problem.", 'start': 2279.31, 'duration': 1.161}, {'end': 2285.589, 'text': "That's true if I want a deterministic hash function.", 'start': 2283.168, 'duration': 2.421}], 'summary': 'Challenges in finding a fixed hash function for large inputs.', 'duration': 29.487, 'max_score': 2256.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2256102.jpg'}, {'end': 2393.512, 'src': 'embed', 'start': 2366.718, 'weight': 4, 'content': [{'end': 2373.603, 'text': 'What I can do is fix a family of hash functions where if I choose one randomly, I get good performance.', 'start': 2366.718, 'duration': 6.885}, {'end': 2382.669, 'text': "And so the hash function I'm going to use, and we're going to spend the rest of the time on, is what I call a universal hash function.", 'start': 2373.663, 'duration': 9.006}, {'end': 2387.253, 'text': 'It satisfies what we call a universal hash property.', 'start': 2382.909, 'duration': 4.344}, {'end': 2393.512, 'text': 'universal hash function.', 'start': 2388.65, 'duration': 4.862}], 'summary': 'Developing a family of hash functions with good performance, focusing on universal hash function.', 'duration': 26.794, 'max_score': 2366.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2366718.jpg'}, {'end': 2543.297, 'src': 'heatmap', 'start': 2477.403, 'weight': 0.792, 'content': [{'end': 2486.887, 'text': 'for some random choice of a, b in this larger range.', 'start': 2477.403, 'duration': 9.484}, {'end': 2494.15, 'text': 'This is a lot of notation here.', 'start': 2490.128, 'duration': 4.022}, {'end': 2498.804, 'text': 'Essentially, what this is saying is I have a hash family.', 'start': 2494.57, 'duration': 4.234}, {'end': 2508.833, 'text': "It's parameterized by the length of my hash function and some fixed large random prime that's bigger than u.", 'start': 2500.466, 'duration': 8.367}, {'end': 2510.674, 'text': "I'm going to pick some large prime number.", 'start': 2508.833, 'duration': 1.841}, {'end': 2515.418, 'text': "And that's going to be fixed when I make the hash table.", 'start': 2512.496, 'duration': 2.922}, {'end': 2529.727, 'text': "And then when I instantiate the hash table, I'm going to choose randomly one of these things by choosing a random a and a random b from this range.", 'start': 2519.061, 'duration': 10.666}, {'end': 2543.297, 'text': "Does that make sense? Is a not equal to 0? If I had 0 here, I kind of lose the key information, and that's no good.", 'start': 2530.468, 'duration': 12.829}], 'summary': 'Hash family parameterized by hash function length, using large random prime for hash table instantiation.', 'duration': 65.894, 'max_score': 2477.403, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2477403.jpg'}, {'end': 2642.891, 'src': 'embed', 'start': 2595.273, 'weight': 0, 'content': [{'end': 2601.038, 'text': "and I'm choosing one randomly within that family, is universal.", 'start': 2595.273, 'duration': 5.765}, {'end': 2608.904, 'text': 'And universality says that what is the property of universality?', 'start': 2601.198, 'duration': 7.706}, {'end': 2617.178, 'text': 'It means that the probability, by choosing a hash function from this hash family,', 'start': 2610.472, 'duration': 6.706}, {'end': 2637.695, 'text': 'that a certain key collides with another key is less than or equal to 1 over m for all any different two keys in my universe.', 'start': 2617.178, 'duration': 20.517}, {'end': 2642.891, 'text': 'Does that make sense?', 'start': 2642.21, 'duration': 0.681}], 'summary': 'Universality ensures probability of key collision <= 1/m for any 2 keys.', 'duration': 47.618, 'max_score': 2595.273, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2595273.jpg'}, {'end': 2720.658, 'src': 'embed', 'start': 2690.092, 'weight': 3, 'content': [{'end': 2704.942, 'text': 'But we can use this result to show that if we use this universal hash family, that the length of our chains is expected to be constant length.', 'start': 2690.092, 'duration': 14.85}, {'end': 2708.264, 'text': "So we're going to use this property to prove that.", 'start': 2705.923, 'duration': 2.341}, {'end': 2712.667, 'text': "How do we prove that? We're going to do a little probability.", 'start': 2710.145, 'duration': 2.522}, {'end': 2720.658, 'text': "OK, so how are we going to prove that? I'm going to define a random variable, an indicator random variable.", 'start': 2713.595, 'duration': 7.063}], 'summary': 'Using universal hash family to prove constant chain length.', 'duration': 30.566, 'max_score': 2690.092, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2690092.jpg'}, {'end': 2764.407, 'src': 'heatmap', 'start': 2690.092, 'weight': 1, 'content': [{'end': 2704.942, 'text': 'But we can use this result to show that if we use this universal hash family, that the length of our chains is expected to be constant length.', 'start': 2690.092, 'duration': 14.85}, {'end': 2708.264, 'text': "So we're going to use this property to prove that.", 'start': 2705.923, 'duration': 2.341}, {'end': 2712.667, 'text': "How do we prove that? We're going to do a little probability.", 'start': 2710.145, 'duration': 2.522}, {'end': 2720.658, 'text': "OK, so how are we going to prove that? I'm going to define a random variable, an indicator random variable.", 'start': 2713.595, 'duration': 7.063}, {'end': 2722.879, 'text': 'Does anyone remember what an indicator random variable is?', 'start': 2720.758, 'duration': 2.121}, {'end': 2732.203, 'text': "Yeah, it's a variable that, with some amount of probability, is 1, and 1 minus that probability is 0, right?", 'start': 2723.839, 'duration': 8.364}, {'end': 2737.245, 'text': "So I'm going to define this indicator random variable.", 'start': 2733.263, 'duration': 3.982}, {'end': 2749.784, 'text': 'xij is a random variable over my choice over choice of a hash function in my hash family.', 'start': 2737.245, 'duration': 12.539}, {'end': 2764.407, 'text': 'And what does this mean? It means xij equals 1 if hash ki equals hkj.', 'start': 2750.744, 'duration': 13.663}], 'summary': 'Using a universal hash family to prove constant chain length with probability and indicator random variable', 'duration': 74.315, 'max_score': 2690.092, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2690092.jpg'}, {'end': 3146.842, 'src': 'embed', 'start': 3122.52, 'weight': 5, 'content': [{'end': 3130.988, 'text': "If we're growing the number of things we're storing in this thing, it's possible that as we grow n for a fixed m,", 'start': 3122.52, 'duration': 8.468}, {'end': 3136.012, 'text': 'this thing will stop being linear in n.', 'start': 3130.988, 'duration': 5.024}, {'end': 3144.741, 'text': 'Well, then all we have to do is, if we get too far, we rebuild the entire thing, the entire hash table, with the new m,', 'start': 3136.012, 'duration': 8.729}, {'end': 3146.842, 'text': 'just like we did with the dynamic array.', 'start': 3144.741, 'duration': 2.101}], 'summary': 'The hash table may stop being linear in n as the number of items grows, leading to a need to rebuild with the new m.', 'duration': 24.322, 'max_score': 3122.52, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE3122520.jpg'}], 'start': 2026.854, 'title': 'Universal hashing and chain length', 'summary': 'Discusses the concept of universal hash family, probability of collisions, and the constant expected chain length in a hash table with m being at least linear in n. it also explores the need for a universal hash function and the use of random hash functions for better performance.', 'chapters': [{'end': 2255.242, 'start': 2026.854, 'title': 'Hash function and data structure', 'summary': "Discusses the selection of a good hash function for a data structure, focusing on the division method and its impact on collision and performance, with a comparison to python's hash function implementation.", 'duration': 228.388, 'highlights': ['The division method, using k mod m, is an easy way to reduce a large space of keys to a smaller one, though its performance relies on the uniform distribution of keys in the larger space.', "Python's hash function involves jumbling up the key and then performing a modulus operation, with some instances resulting in poor performance due to chain lengths and collisions.", 'The emphasis of the class is on proving the theoretical soundness of the hash function and data structure, aiming for performance independent of the stored inputs or keys.']}, {'end': 2430.203, 'start': 2256.102, 'title': 'Choosing universal hash functions', 'summary': 'Discusses the need for a universal hash function due to the impracticality of choosing a fixed hash function, and explores the concept of using a random hash function to achieve better performance.', 'duration': 174.101, 'highlights': ['The chapter explains the impracticality of choosing a fixed hash function due to the existence of inputs that map everything to one place, leading to the need for a universal hash function.', 'It explores the concept of using a nondeterministic approach where a hash function is chosen randomly later, making it difficult for adversaries to give bad inputs.', 'The chapter introduces the concept of a universal hash function, which is a family of hash functions that, when chosen randomly, provides good performance.']}, {'end': 3167.627, 'start': 2432.644, 'title': 'Universal hashing and chain length', 'summary': 'Explains the concept of universal hash family, where a randomly chosen hash function from the family ensures collisions with a probability less than 1/m, and uses this property to prove that the expected length of chains in a hash table is constant, under the condition that m is at least linear in n.', 'duration': 734.983, 'highlights': ['A universal hash family ensures collisions with a probability less than 1/m for any two keys in the universe space, providing a measure of how well distributed the hash functions are. This property ensures that the probability of collisions between keys is less than 1/m, indicating a well-distributed hash function.', 'Using the property of universality, it is proven that the expected length of chains in a hash table is constant when m is at least linear in n. The chapter demonstrates that when m is at least linear in n, the expected length of chains in a hash table remains constant, leveraging the property of universality.', 'In the case of dynamically growing the number of stored items, it is suggested to rebuild the hash table with a new m if the chain lengths stop being linear in n, ensuring the constant expected chain length. To maintain a constant expected chain length, it is recommended to rebuild the hash table with a new m when the number of stored items grows dynamically and the chain lengths deviate from being linear in n.']}], 'duration': 1140.773, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Nu8YGneFCWE/pics/Nu8YGneFCWE2026854.jpg', 'highlights': ['A universal hash family ensures collisions with a probability less than 1/m for any two keys', 'The division method, using k mod m, is an easy way to reduce a large space of keys to a smaller one', 'The chapter explains the impracticality of choosing a fixed hash function due to the existence of inputs that map everything to one place', 'Using the property of universality, it is proven that the expected length of chains in a hash table is constant when m is at least linear in n', 'The chapter introduces the concept of a universal hash function, which is a family of hash functions that, when chosen randomly, provides good performance', 'In the case of dynamically growing the number of stored items, it is suggested to rebuild the hash table with a new m if the chain lengths stop being linear in n']}], 'highlights': ['Direct access array allows constant time operations for finding, inserting, and deleting items based on a key', 'A better data structure for faster find operations is introduced, achieving log n time with n log n build overhead', 'The concept of hashing for set data structures, focusing on efficient ways to query items by their key', 'The lecture discusses the concept of chaining in hash tables, showcasing the utilization of various data structures like linked lists, dynamic arrays, and sorted arrays to store hashed values', 'The lecture explores the limitations of achieving faster than log n time in a restricted model of computation, while also discussing potential improvements beyond this time complexity', 'A universal hash family ensures collisions with a probability less than 1/m for any two keys', "The direct access array supports up to the size of the largest key 'u'", 'The lecture explains the concept of chaining in hash tables, showcasing the utilization of various data structures like linked lists, dynamic arrays, and sorted arrays to store hashed values', 'The direct access array operation is super powerful, offering constant time access to any space in memory, potentially larger than a linear branching factor, depending on the size of the model and machine', "The concept is simpler than hashing and relies on storing items at specific indices in an array, enabling quick access based on the item's key value"]}