title
Lecture 42 — Content Based Recommendations | Stanford University

description
Check out the following interesting papers. Happy learning! Paper Title: "On the Role of Reviewer Expertise in Temporal Review Helpfulness Prediction" Paper: https://aclanthology.org/2023.findings-eacl.125/ Dataset: https://huggingface.co/datasets/tafseer-nayeem/review_helpfulness_prediction Paper Title: "Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion" Paper: https://aclanthology.org/C18-1102/ Paper Title: "Extract with Order for Coherent Multi-Document Summarization" Paper: https://aclanthology.org/W17-2407.pdf Paper Title: "Paraphrastic Fusion for Abstractive Multi-Sentence Compression Generation" Paper: https://dl.acm.org/doi/abs/10.1145/3132847.3133106 Paper Title: "Neural Diverse Abstractive Sentence Compression Generation" Paper: https://link.springer.com/chapter/10.1007/978-3-030-15719-7_14

detail
{'title': 'Lecture 42 — Content Based Recommendations | Stanford University', 'heatmap': [{'end': 138.733, 'start': 124.896, 'weight': 0.717}, {'end': 208.532, 'start': 193.516, 'weight': 0.764}, {'end': 281.754, 'start': 230.336, 'weight': 0.782}, {'end': 447.806, 'start': 413.94, 'weight': 0.906}, {'end': 497.62, 'start': 471.251, 'weight': 0.704}, {'end': 606.439, 'start': 590.296, 'weight': 0.855}, {'end': 801.14, 'start': 775.859, 'weight': 1}], 'summary': 'Covers content-based recommender systems, user profiles and item recommendations, tf-idf technique for content recommendation, user profiles and boolean utility matrix, and the theory, pros, and cons of content-based recommendation approach.', 'chapters': [{'end': 45.866, 'segs': [{'end': 45.866, 'src': 'embed', 'start': 0.422, 'weight': 0, 'content': [{'end': 2.123, 'text': 'Welcome back to Mining of Massive Data Sets.', 'start': 0.422, 'duration': 1.701}, {'end': 7.308, 'text': "We're going to continue our lesson on recommender systems by looking at content based recommendation systems.", 'start': 2.143, 'duration': 5.165}, {'end': 16.915, 'text': 'The main idea behind content based recommendation systems is to recommend items to a customer x,', 'start': 10.71, 'duration': 6.205}, {'end': 20.398, 'text': 'similar to previous items rated highly by the same customer.', 'start': 16.915, 'duration': 3.483}, {'end': 30.175, 'text': 'For example, in an example of movies, you might recommend movies with the same actor or actors, director, genre, and so on.', 'start': 21.872, 'duration': 8.303}, {'end': 37.718, 'text': 'In the case of websites, blogs, or news, we might recommend articles with similar content or on similar topics.', 'start': 31.336, 'duration': 6.382}, {'end': 42.98, 'text': 'In the case of people recommendations, we might recommend people with many common friends to each other.', 'start': 38.418, 'duration': 4.562}, {'end': 45.866, 'text': "So here's our plan of action.", 'start': 44.946, 'duration': 0.92}], 'summary': 'Content-based recommendation systems recommend items based on similarities to previously rated items by the same customer.', 'duration': 45.444, 'max_score': 0.422, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY422.jpg'}], 'start': 0.422, 'title': 'Content-based recommender systems', 'summary': 'Explains content-based recommendation systems, which recommend items to a customer based on previous highly-rated items, using similar attributes like actor, director, genre, or content.', 'chapters': [{'end': 45.866, 'start': 0.422, 'title': 'Content-based recommender systems', 'summary': 'Explains content-based recommendation systems, which recommend items to a customer based on previous highly-rated items, using similar attributes like actor, director, genre, or content.', 'duration': 45.444, 'highlights': ['Content-based recommendation systems recommend items to a customer based on previous highly-rated items, using similar attributes like actor, director, genre, or content.', 'Examples include recommending movies with the same actor or director, similar articles for websites, or people with many common friends.', 'The systems aim to provide personalized recommendations by analyzing attributes and preferences of the customer, enhancing user experience and increasing engagement.']}], 'duration': 45.444, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY422.jpg', 'highlights': ['Content-based recommendation systems recommend items based on previous highly-rated items and similar attributes.', 'Examples include recommending movies with the same actor or director, similar articles for websites, or people with many common friends.', 'The systems aim to provide personalized recommendations by analyzing attributes and preferences of the customer.']}, {'end': 195.078, 'segs': [{'end': 72.942, 'src': 'embed', 'start': 46.527, 'weight': 1, 'content': [{'end': 53.609, 'text': "We're going to start with the user and find out a set of items the user likes using both explicit and implicit data.", 'start': 46.527, 'duration': 7.082}, {'end': 61.191, 'text': 'For example, we might look at the items that the user has rated highly and the set of items the user has purchased.', 'start': 55.449, 'duration': 5.742}, {'end': 65.132, 'text': 'And for each of, each of those items, we are going to build an item profile.', 'start': 62.112, 'duration': 3.02}, {'end': 68.974, 'text': 'An item profile is a description of the item.', 'start': 66.293, 'duration': 2.681}, {'end': 72.942, 'text': 'For example, in this case, we are dealing with geometric shapes.', 'start': 69.94, 'duration': 3.002}], 'summary': 'Using explicit and implicit data, item profiles are created based on user preferences, such as highly-rated items and purchased items.', 'duration': 26.415, 'max_score': 46.527, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY46527.jpg'}, {'end': 124.156, 'src': 'embed', 'start': 98.308, 'weight': 0, 'content': [{'end': 104.11, 'text': 'The user profile infers the likes of the user from the profile of the item the user likes.', 'start': 98.308, 'duration': 5.802}, {'end': 111.192, 'text': 'Because the user here likes a red circle and a red triangle, we infer that the user likes the color red.', 'start': 104.75, 'duration': 6.442}, {'end': 113.773, 'text': 'They like circles and they like triangles.', 'start': 111.612, 'duration': 2.161}, {'end': 121.035, 'text': 'Now, once we have a profile of the user, we can then match that against the catalog and recommend other items to the user.', 'start': 115.053, 'duration': 5.982}, {'end': 124.156, 'text': "So let's say the catalog has a bunch of items in it.", 'start': 121.535, 'duration': 2.621}], 'summary': 'The user likes red, circles, and triangles, enabling personalized recommendations.', 'duration': 25.848, 'max_score': 98.308, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY98308.jpg'}, {'end': 156.047, 'src': 'heatmap', 'start': 124.896, 'weight': 0.717, 'content': [{'end': 127.557, 'text': 'Some of those items are red, so we can recommend those to the user.', 'start': 124.896, 'duration': 2.661}, {'end': 132.846, 'text': "So let's, let's look at how to build these item profiles.", 'start': 130.103, 'duration': 2.743}, {'end': 138.733, 'text': 'For each item, we want to create an item profile, which we can then use to build user profiles.', 'start': 133.767, 'duration': 4.966}, {'end': 143.165, 'text': 'So the profile is a set of features about the item.', 'start': 140.404, 'duration': 2.761}, {'end': 151.366, 'text': 'In the case of movies, for instance, the item profile might include author, title, actor, director, and so on.', 'start': 143.625, 'duration': 7.741}, {'end': 156.047, 'text': 'In the case of images and videos we might use metadata and tags.', 'start': 152.166, 'duration': 3.881}], 'summary': 'Build item profiles by capturing features like author, title, actor, director for movies, and metadata and tags for images and videos.', 'duration': 31.151, 'max_score': 124.896, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY124896.jpg'}, {'end': 166.369, 'src': 'embed', 'start': 143.625, 'weight': 2, 'content': [{'end': 151.366, 'text': 'In the case of movies, for instance, the item profile might include author, title, actor, director, and so on.', 'start': 143.625, 'duration': 7.741}, {'end': 156.047, 'text': 'In the case of images and videos we might use metadata and tags.', 'start': 152.166, 'duration': 3.881}, {'end': 160.328, 'text': 'In the case of people, the item profile might be a set of friends of the user.', 'start': 156.647, 'duration': 3.681}, {'end': 166.369, 'text': "Even though the item profile is a set of features, it's often convenient to think of it as a vector.", 'start': 161.228, 'duration': 5.141}], 'summary': 'Item profiles for movies, images, videos, and people are represented as vectors with specific features and metadata.', 'duration': 22.744, 'max_score': 143.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY143625.jpg'}], 'start': 46.527, 'title': 'User profile and item recommendation', 'summary': 'Discusses creating item profiles based on user preferences, using explicit and implicit data to infer user profiles, and matching them against the catalog to recommend relevant items.', 'chapters': [{'end': 195.078, 'start': 46.527, 'title': 'User profile and item recommendation', 'summary': 'Discusses creating item profiles based on user preferences, using explicit and implicit data to infer user profiles, and matching them against the catalog to recommend relevant items.', 'duration': 148.551, 'highlights': ['The process involves finding a set of items the user likes using explicit and implicit data, such as highly rated items and purchased items, then building item profiles for each of those items (e.g., description of the item, features about the item).', 'Inferencing a user profile from the item profiles by identifying the common features liked by the user, such as the color red, circles, and triangles, then using this profile to recommend other relevant items from the catalog.', 'The item profiles can be created for different types of items (e.g., movies, images, videos) by including specific features (e.g., author, title, actor, director, metadata, tags) and representing them as vectors with Boolean or real-valued entries per feature (e.g., 0 or 1 for each actor or director in a movie).']}], 'duration': 148.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY46527.jpg', 'highlights': ['Inferencing a user profile from the item profiles by identifying the common features liked by the user, such as the color red, circles, and triangles, then using this profile to recommend other relevant items from the catalog.', 'The process involves finding a set of items the user likes using explicit and implicit data, such as highly rated items and purchased items, then building item profiles for each of those items (e.g., description of the item, features about the item).', 'The item profiles can be created for different types of items (e.g., movies, images, videos) by including specific features (e.g., author, title, actor, director, metadata, tags) and representing them as vectors with Boolean or real-valued entries per feature (e.g., 0 or 1 for each actor or director in a movie).']}, {'end': 413.28, 'segs': [{'end': 226.515, 'src': 'embed', 'start': 196.119, 'weight': 3, 'content': [{'end': 198.341, 'text': 'For example, we might be recommending news articles.', 'start': 196.119, 'duration': 2.222}, {'end': 208.532, 'text': "Now, what's the item profile in this case? The simplest item profile in this case is to pick the set of important words in the document or the item.", 'start': 199.843, 'duration': 8.689}, {'end': 211.555, 'text': 'How do you pick the important words in the item?', 'start': 209.754, 'duration': 1.801}, {'end': 219.293, 'text': 'The usual heuristic that we get from text mining is a technique called TF-IDF or term frequency, inverse document frequency.', 'start': 212.591, 'duration': 6.702}, {'end': 226.515, 'text': "Many of you may have come across TF-IDF in, in the context of information retrieval, but for those of you who have not, here's a quick refresher.", 'start': 220.053, 'duration': 6.462}], 'summary': 'Using tf-idf to select important words for news article recommendations.', 'duration': 30.396, 'max_score': 196.119, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY196119.jpg'}, {'end': 329.108, 'src': 'heatmap', 'start': 230.336, 'weight': 0, 'content': [{'end': 238.199, 'text': "Let's say we are looking at a document or item j and we are computing the score for term or feature i.", 'start': 230.336, 'duration': 7.863}, {'end': 251.638, 'text': 'The term frequency tfij for feature i in document j is just the number of times the feature j, the feature i, appears in document j.', 'start': 239.349, 'duration': 12.289}, {'end': 256.821, 'text': 'divided by the maximum number of time, that same feature appears in any document.', 'start': 251.638, 'duration': 5.183}, {'end': 261.785, 'text': "For example, let's say the feature is a certain word, the word apple.", 'start': 257.402, 'duration': 4.383}, {'end': 267.789, 'text': "And in the document that we're looking at, the word apple appears five times.", 'start': 263.266, 'duration': 4.523}, {'end': 272.571, 'text': "But there's another document where the word apple appears 23 times.", 'start': 268.45, 'duration': 4.121}, {'end': 276.453, 'text': 'And this is the maximum number of times the word apple appears in any document at all.', 'start': 272.851, 'duration': 3.602}, {'end': 281.754, 'text': 'Then the term frequency tfij is, is 5 divided by 23.', 'start': 277.013, 'duration': 4.741}, {'end': 289.417, 'text': "Now, I'm glossing over the fact that we need to normalize tf to account for the fact that document lengths are different.", 'start': 281.754, 'duration': 7.663}, {'end': 290.757, 'text': "Let's just ignore that for the moment.", 'start': 289.437, 'duration': 1.32}, {'end': 297.746, 'text': 'Now, the term frequency captures the number of times a term appears in a document.', 'start': 291.698, 'duration': 6.048}, {'end': 303.91, 'text': 'Intuitively, the more often a term appears in a document, the more important a feature it is.', 'start': 298.166, 'duration': 5.744}, {'end': 309.774, 'text': 'For example, if a document mentions the word Apple five times,', 'start': 304.17, 'duration': 5.604}, {'end': 314.357, 'text': 'the word Apple is more important in that document than another document that just mentions it once.', 'start': 309.774, 'duration': 4.583}, {'end': 319.266, 'text': 'But how do you compare the weight of different terms?', 'start': 316.385, 'duration': 2.881}, {'end': 325.227, 'text': 'For example, you know the, a rare word appearing just a couple of times,', 'start': 319.926, 'duration': 5.301}, {'end': 329.108, 'text': 'might be more important than a more common word like the appearing thousands of times.', 'start': 325.227, 'duration': 3.881}], 'summary': 'Term frequency measures importance of terms in documents.', 'duration': 52.095, 'max_score': 230.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY230336.jpg'}, {'end': 413.28, 'src': 'embed', 'start': 357.554, 'weight': 1, 'content': [{'end': 365.781, 'text': 'Notice that the more common a term, the larger ni, and the larger ni, the lower the IDF.', 'start': 357.554, 'duration': 8.227}, {'end': 374.961, 'text': 'The IDF function ensures, you know, gives a lower weight to more common words and a higher weight to rarer words.', 'start': 367.714, 'duration': 7.247}, {'end': 388.354, 'text': 'So if you put these two pieces together, the TF-IDF score of feature i for document j is obtained by multiplying the term frequency and the IDF.', 'start': 376.162, 'duration': 12.192}, {'end': 392.578, 'text': 'So given a document.', 'start': 391.597, 'duration': 0.981}, {'end': 397.455, 'text': 'You compute the TF-IDF scores for every term in the document.', 'start': 393.214, 'duration': 4.241}, {'end': 402.217, 'text': 'And then you sort all the terms in the document by the TF-IDF scores.', 'start': 398.395, 'duration': 3.822}, {'end': 410.519, 'text': 'And then you have some kind of threshold or, or you might pick the set of words with the highest TF-IDF scores in the document,', 'start': 402.997, 'duration': 7.522}, {'end': 413.28, 'text': 'together with their scores, and that would be the doc profile.', 'start': 410.519, 'duration': 2.761}], 'summary': 'Tf-idf score is computed by multiplying term frequency and idf, giving lower weight to common words and higher weight to rarer words.', 'duration': 55.726, 'max_score': 357.554, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY357554.jpg'}], 'start': 196.119, 'title': 'Understanding tf-idf for content recommendation', 'summary': 'Covers tf-idf technique for news article recommendation, term frequency calculation for word importance, and the importance of tf-idf in giving weight to rarer words, highlighting its role in document analysis.', 'chapters': [{'end': 251.638, 'start': 196.119, 'title': 'Content recommendation techniques', 'summary': 'Discusses the use of tf-idf technique for recommending news articles based on important words in the document, with a quick refresher on term frequency and inverse document frequency.', 'duration': 55.519, 'highlights': ['The usual heuristic for picking important words in the item is TF-IDF, which is a technique for computing the score for a term or feature in a document or item.', 'Term frequency tfij for feature i in document j is the number of times the feature i appears in document j.']}, {'end': 329.108, 'start': 251.638, 'title': 'Term frequency calculation', 'summary': "Discusses the calculation of term frequency using the example of the word 'apple', explaining that it is calculated by dividing the number of times the word appears in a document by the maximum number of times it appears in any document, emphasizing the importance of a term based on its frequency and comparing the weight of different terms.", 'duration': 77.47, 'highlights': ['The term frequency tfij is calculated by dividing the number of times a term appears in a document by the maximum number of times it appears in any document, such as 5 divided by 23.', 'Term frequency captures the number of times a term appears in a document, indicating the importance of a feature based on its frequency.', 'The chapter discusses the importance of comparing the weight of different terms, highlighting that a rare word appearing just a couple of times might be more important than a more common word appearing thousands of times.']}, {'end': 413.28, 'start': 330.028, 'title': 'Understanding tf-idf for document analysis', 'summary': 'Explains the concept of inverse document frequency (idf) and term frequency-inverse document frequency (tf-idf) scores, highlighting the importance of tf-idf in giving lower weight to common words and higher weight to rarer words, and the process of computing tf-idf scores for document analysis.', 'duration': 83.252, 'highlights': ['The process of computing the TF-IDF scores for every term in a document and sorting them by the TF-IDF scores is explained. Process of computing TF-IDF scores, Sorting terms by TF-IDF scores', 'The importance of IDF in giving lower weight to more common words and higher weight to rarer words is emphasized. Importance of IDF, Lower weight to common words, Higher weight to rarer words', 'The role of TF-IDF in providing a lower weight to common words and a higher weight to rarer words is highlighted. Role of TF-IDF, Lower weight to common words, Higher weight to rarer words']}], 'duration': 217.161, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY196119.jpg', 'highlights': ['The chapter discusses the importance of comparing the weight of different terms, highlighting that a rare word appearing just a couple of times might be more important than a more common word appearing thousands of times.', 'The process of computing the TF-IDF scores for every term in a document and sorting them by the TF-IDF scores is explained.', 'Term frequency captures the number of times a term appears in a document, indicating the importance of a feature based on its frequency.', 'The usual heuristic for picking important words in the item is TF-IDF, which is a technique for computing the score for a term or feature in a document or item.', 'The importance of IDF in giving lower weight to more common words and higher weight to rarer words is emphasized.']}, {'end': 922.406, 'segs': [{'end': 447.806, 'src': 'heatmap', 'start': 413.94, 'weight': 0.906, 'content': [{'end': 418.381, 'text': 'So in this case, the doc profile is a real value vector as opposed to a Boolean vector.', 'start': 413.94, 'duration': 4.441}, {'end': 423.817, 'text': 'Now that we have item profiles, our next task is to construct user profiles.', 'start': 419.793, 'duration': 4.024}, {'end': 430.823, 'text': "Let's say we have a user who's rated items with profiles i1 through in.", 'start': 426.439, 'duration': 4.384}, {'end': 436.828, 'text': 'Now remember, i1 through in are our vectors.', 'start': 433.765, 'duration': 3.063}, {'end': 447.806, 'text': "Let's say this is i1, this is i2, i3, and so on.", 'start': 442.462, 'duration': 5.344}], 'summary': 'Construct user profiles from item profiles using real value vectors.', 'duration': 33.866, 'max_score': 413.94, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY413940.jpg'}, {'end': 529.747, 'src': 'heatmap', 'start': 471.251, 'weight': 0, 'content': [{'end': 480.875, 'text': 'So if, if I take all the item profiles in the users you know, of, of all the item the user has, has rated and then take the average.', 'start': 471.251, 'duration': 9.624}, {'end': 484.197, 'text': 'That would be a simplest way of constructing a user profile.', 'start': 481.436, 'duration': 2.761}, {'end': 489.037, 'text': "Now this doesn't take into account that the user liked certain items more than others.", 'start': 485.276, 'duration': 3.761}, {'end': 497.62, 'text': 'So in that case, we might want to use a weighted average where the weight is equal to the rating given by the user for for each item.', 'start': 489.658, 'duration': 7.962}, {'end': 501.082, 'text': 'Then you would have a weighted average item profile.', 'start': 498.481, 'duration': 2.601}, {'end': 510.505, 'text': 'A variant of this is to normalize these weights using the average rating of the user.', 'start': 506.123, 'duration': 4.382}, {'end': 513.426, 'text': "And we'll see an example that makes this idea clear.", 'start': 510.805, 'duration': 2.621}, {'end': 518.355, 'text': 'And of course, much more sophisticated aggregations are possible.', 'start': 515.631, 'duration': 2.724}, {'end': 521.018, 'text': "Here we're only looking at some very simple examples.", 'start': 518.635, 'duration': 2.383}, {'end': 529.747, 'text': "Let's look at an example that you know, that'll clarify weighted average item profiles and how to normalize weights.", 'start': 522.619, 'duration': 7.128}], 'summary': 'Construct user profiles using average and weighted average item profiles, with a focus on normalization.', 'duration': 48.311, 'max_score': 471.251, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY471251.jpg'}, {'end': 614.981, 'src': 'heatmap', 'start': 590.296, 'weight': 6, 'content': [{'end': 598.018, 'text': 'And so the weight of feature A is going to be 2 divided by the total number of item profiles which is 5 which is 0.4.', 'start': 590.296, 'duration': 7.722}, {'end': 603.779, 'text': 'And the weight of feature B correspondingly is going to be 3 by 5.', 'start': 598.018, 'duration': 5.761}, {'end': 606.439, 'text': "Let's look at a more complex example with star ratings.", 'start': 603.779, 'duration': 2.66}, {'end': 612.881, 'text': 'Suppose we have star ratings in the range 1 to 5.', 'start': 609.46, 'duration': 3.421}, {'end': 614.981, 'text': 'And the user has once again watched five movies.', 'start': 612.881, 'duration': 2.1}], 'summary': 'Weights for features a and b are 0.4 and 0.6 respectively.', 'duration': 24.685, 'max_score': 590.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY590296.jpg'}, {'end': 734.546, 'src': 'embed', 'start': 704.334, 'weight': 5, 'content': [{'end': 710.477, 'text': "And so what you're going to do is to subtract the average rating from each of the individual movie ratings.", 'start': 704.334, 'duration': 6.143}, {'end': 719.938, 'text': 'So in this case, the movies with actor A, the normalized ratings in that case, instead of 3 and 5, becomes 0 and plus 2.', 'start': 711.098, 'duration': 8.84}, {'end': 727.282, 'text': 'And for actor B, the normalized ratings become minus 2, minus 1, and plus 1.', 'start': 719.938, 'duration': 7.344}, {'end': 734.546, 'text': 'Notice that this captures intuition that the user did not like the the first two movies with actor B,', 'start': 727.282, 'duration': 7.264}], 'summary': 'Subtract average rating to normalize movie ratings for actors a and b.', 'duration': 30.212, 'max_score': 704.334, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY704334.jpg'}, {'end': 801.14, 'src': 'heatmap', 'start': 775.859, 'weight': 1, 'content': [{'end': 784.041, 'text': 'This indicates a mild positive preference for for actor A and a mild negative preference for actor B.', 'start': 775.859, 'duration': 8.182}, {'end': 789.102, 'text': 'Now that we have user profiles and item profiles, the next task is to recommend certain items to the user.', 'start': 784.041, 'duration': 5.061}, {'end': 801.14, 'text': 'The key step in this is to take a pair of user profile and item profile and figure what the rating for that user and item pair is likely to be.', 'start': 790.542, 'duration': 10.598}], 'summary': 'Mild preference for actor a, negative for actor b; task: recommend items based on user and item profiles', 'duration': 25.281, 'max_score': 775.859, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY775859.jpg'}, {'end': 865.107, 'src': 'embed', 'start': 841.339, 'weight': 4, 'content': [{'end': 850.365, 'text': "And this distance in this case, we'll call this the cosine similarity between the user x and the item i.", 'start': 841.339, 'duration': 9.026}, {'end': 856.43, 'text': 'Now, technically, the cosine distance is actually the angle theta and not the cosine of the angle right?', 'start': 850.365, 'duration': 6.065}, {'end': 861.164, 'text': 'The cosine distance, as we studied in an earlier lecture, is the angle theta.', 'start': 856.921, 'duration': 4.243}, {'end': 865.107, 'text': 'And the cosine similarity is the angle 180 minus theta.', 'start': 861.464, 'duration': 3.643}], 'summary': 'Cosine similarity measures similarity between user x and item i.', 'duration': 23.768, 'max_score': 841.339, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY841339.jpg'}], 'start': 413.94, 'title': 'User profiles and boolean utility matrix', 'summary': 'Explains the construction of user profiles and boolean utility matrix, covering averaging item profiles, weighted averages based on user ratings, normalization, and sophisticated aggregations. it also explores the process of computing user and item profiles, normalizing ratings, and utilizing cosine similarity to recommend items, with key points including the computation of profile weights and the use of cosine similarity as a distance metric.', 'chapters': [{'end': 529.747, 'start': 413.94, 'title': 'User profile construction', 'summary': 'Explains the construction of user profiles by averaging item profiles and using weighted averages based on user ratings, with examples of normalization and sophisticated aggregations.', 'duration': 115.807, 'highlights': ['User profiles are constructed by averaging item profiles, with the simplest method being to take the average of all item profiles rated by the user.', "Weighted average item profiles can be formed by assigning weights equal to the user's rating for each item, allowing for a more personalized representation of user preferences.", 'Normalization of weights using the average rating of the user is a variant that enhances the accuracy of weighted average item profiles.', 'The chapter emphasizes that more sophisticated aggregations beyond simple averaging and weighted averaging are possible for constructing user profiles.']}, {'end': 922.406, 'start': 532.71, 'title': 'Boolean utility matrix', 'summary': 'Introduces the concept of boolean utility matrix and explores the process of computing user and item profiles, normalizing ratings, and utilizing cosine similarity to recommend items, with key points including the computation of profile weights and the use of cosine similarity as a distance metric.', 'duration': 389.696, 'highlights': ["The computation of profile weights involves calculating the ratio of the occurrence of a feature in the user's watched items, providing quantifiable data such as the weight of feature A being 0.4 and the weight of feature B being 0.6. Computation of profile weights and the quantifiable data of feature weights (e.g., 0.4 for feature A and 0.6 for feature B).", 'The process of normalizing ratings is explained, with an example illustrating the subtraction of the average rating from individual movie ratings, capturing the distinction between positive and negative ratings. Explanation of normalizing ratings and its role in capturing positive and negative ratings.', 'The utilization of cosine similarity as a distance metric is discussed, highlighting its role in estimating the angle between user and item profiles and its reflection of the similarity between the two vectors. Discussion of cosine similarity as a distance metric, emphasizing its estimation of the angle between user and item profiles and its portrayal of similarity.']}], 'duration': 508.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY413940.jpg', 'highlights': ['More sophisticated aggregations beyond simple averaging and weighted averaging are possible for constructing user profiles.', "Weighted average item profiles can be formed by assigning weights equal to the user's rating for each item, allowing for a more personalized representation of user preferences.", 'Normalization of weights using the average rating of the user is a variant that enhances the accuracy of weighted average item profiles.', 'User profiles are constructed by averaging item profiles, with the simplest method being to take the average of all item profiles rated by the user.', 'The utilization of cosine similarity as a distance metric is discussed, highlighting its role in estimating the angle between user and item profiles and its reflection of the similarity between the two vectors.', 'The process of normalizing ratings is explained, with an example illustrating the subtraction of the average rating from individual movie ratings, capturing the distinction between positive and negative ratings.', "The computation of profile weights involves calculating the ratio of the occurrence of a feature in the user's watched items, providing quantifiable data such as the weight of feature A being 0.4 and the weight of feature B being 0.6."]}, {'end': 1259.988, 'segs': [{'end': 990.033, 'src': 'embed', 'start': 952.781, 'weight': 0, 'content': [{'end': 960.489, 'text': "The biggest pro of the content based recommendation approach is that you don't need data about other users in order to make recommendations to a specific user.", 'start': 952.781, 'duration': 7.708}, {'end': 963.914, 'text': 'This turns out to be a very, very good thing,', 'start': 961.892, 'duration': 2.022}, {'end': 970.699, 'text': 'because you know you can start working making content-based recommendations from day one for for your very first user.', 'start': 963.914, 'duration': 6.785}, {'end': 980.446, 'text': 'Another good thing about content-based recommendation is that you can recommend to users with very unique taste.', 'start': 975.762, 'duration': 4.684}, {'end': 985.51, 'text': "When we go, when you get to collaborative filtering, we'll see that collaborative.", 'start': 981.747, 'duration': 3.763}, {'end': 990.033, 'text': 'in collaborative filtering to make recommendations to a user, we need to find other similar users.', 'start': 985.51, 'duration': 4.523}], 'summary': 'Content-based recommendation requires no data on other users, allowing immediate recommendations for unique user tastes.', 'duration': 37.252, 'max_score': 952.781, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY952781.jpg'}, {'end': 1054.092, 'src': 'embed', 'start': 1029.667, 'weight': 2, 'content': [{'end': 1035.609, 'text': "So we don't have a so-called first-rater problem that we'll see in the, in the collaborative filtering approach.", 'start': 1029.667, 'duration': 5.942}, {'end': 1039.131, 'text': 'We can make recommendations for an item as soon as it becomes available.', 'start': 1036.01, 'duration': 3.121}, {'end': 1046.574, 'text': 'And finally, whenever the content-based approach makes a recommendation, you can provide an explanation.', 'start': 1041.932, 'duration': 4.642}, {'end': 1049.889, 'text': 'to the user for why a certain item was recommended.', 'start': 1047.406, 'duration': 2.483}, {'end': 1054.092, 'text': 'In particular, you can just list the content features that caused the item to be recommended.', 'start': 1050.409, 'duration': 3.683}], 'summary': 'Content-based approach allows immediate item recommendations with explanations.', 'duration': 24.425, 'max_score': 1029.667, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY1029666.jpg'}, {'end': 1152.884, 'src': 'embed', 'start': 1122.243, 'weight': 4, 'content': [{'end': 1126.146, 'text': 'And images, of course, you know, the features are very, very hard to find.', 'start': 1122.243, 'duration': 3.903}, {'end': 1134.772, 'text': 'So in general, the finding appropriate features to make content-based approaches work turns out to be a very, very hard problem.', 'start': 1127.267, 'duration': 7.505}, {'end': 1140.196, 'text': 'And this is the main reason why the content-based approach is not more popular.', 'start': 1135.112, 'duration': 5.084}, {'end': 1145.44, 'text': 'The second problem is one of over-specialization.', 'start': 1143.478, 'duration': 1.962}, {'end': 1152.884, 'text': 'Remember, the user profile is built using the item profile of the, the item that the user has rated or purchased.', 'start': 1146.138, 'duration': 6.746}], 'summary': 'Finding appropriate features for content-based approaches is very hard, leading to its lack of popularity.', 'duration': 30.641, 'max_score': 1122.243, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY1122243.jpg'}, {'end': 1200.335, 'src': 'embed', 'start': 1172.611, 'weight': 5, 'content': [{'end': 1176.614, 'text': 'In general, people might have multiple interests and might express only some of them in the past.', 'start': 1172.611, 'duration': 4.003}, {'end': 1179.257, 'text': "And so it's hard to you know.", 'start': 1177.395, 'duration': 1.862}, {'end': 1187.824, 'text': "so it's very easy this way to miss recommending interesting items to users because you don't have enough enough data on the user.", 'start': 1179.257, 'duration': 8.567}, {'end': 1195.711, 'text': "Another serious problem of the content-based approach is that it's unable to exploit the quality judgments of other users.", 'start': 1189.886, 'duration': 5.825}, {'end': 1200.335, 'text': "For example, there might be a certain video or a movie that's wildly popular.", 'start': 1196.311, 'duration': 4.024}], 'summary': "Content-based approach may miss recommending items due to limited user data and may not leverage other users' quality judgments.", 'duration': 27.724, 'max_score': 1172.611, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY1172611.jpg'}, {'end': 1245.532, 'src': 'embed', 'start': 1216.376, 'weight': 7, 'content': [{'end': 1220.899, 'text': 'The final problem that we have with the content-based approach is one of a, a cold start problem for new users.', 'start': 1216.376, 'duration': 4.523}, {'end': 1228.324, 'text': 'Remember, the user profile is built by aggregating item profiles of the items the user has rated.', 'start': 1222, 'duration': 6.324}, {'end': 1235.233, 'text': 'When you have a new user, the new user has not created any items, and so there is no user profile.', 'start': 1229.107, 'duration': 6.126}, {'end': 1239.698, 'text': "So there's a challenging problem of how to build a user profile for a new user.", 'start': 1236.034, 'duration': 3.664}, {'end': 1245.532, 'text': 'In most practical situations, new users start with.', 'start': 1241.528, 'duration': 4.004}], 'summary': 'Content-based approach faces cold start problem for new users, with no user profile for new users.', 'duration': 29.156, 'max_score': 1216.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY1216376.jpg'}], 'start': 922.406, 'title': 'Content-based recommendation', 'summary': "Discusses the theory, pros, and cons of content-based recommendation approach, including its ability to cater to unique user tastes, recommend new items, and provide explanations. it also outlines challenges like finding appropriate features for different media types and limitations such as the inability to recommend items outside a user's past preferences.", 'chapters': [{'end': 1071.965, 'start': 922.406, 'title': 'Content-based recommendations', 'summary': 'Discusses the theory, pros, and cons of content-based recommendation approach, highlighting its ability to make recommendations without needing data about other users, cater to users with unique taste, recommend new and unpopular items, and provide explanations for recommendations.', 'duration': 149.559, 'highlights': ['Content-based recommendation approach can make recommendations without needing data about other users, allowing for immediate recommendations for the very first user.', 'Content-based recommendation approach can recommend to users with very unique taste, addressing the limitations of collaborative filtering in finding similar users.', 'Content-based recommendation approach can recommend new and unpopular items without needing ratings from users, solving the first-rater problem in collaborative filtering.', 'Content-based recommendation approach provides explanations for recommendations, allowing for transparency and personalized insights into why certain items are recommended.']}, {'end': 1145.44, 'start': 1075.916, 'title': 'Challenges of content-based approach', 'summary': 'Discusses the challenges of content-based approach, highlighting the difficulty in finding appropriate features for images, movies, and music, and the issue of over-specialization.', 'duration': 69.524, 'highlights': ['The main problem with the content-based approach is the difficulty in finding appropriate features, making it a very hard problem. This is the main reason why the approach is not more popular.', 'Finding features for movies, music, and images is very hard due to the difficulty in boxing them into specific genres, actors, or directors.', 'Movies often cross genres, and users are not very often loyal to specific actors or directors, making it hard to define features for movies.']}, {'end': 1259.988, 'start': 1146.138, 'title': 'Issues with content-based recommendation', 'summary': "Discusses the limitations of content-based recommendation systems, including the inability to recommend items outside a user's past preferences, the challenge of recommending items with insufficient user data, and the cold start problem for new users.", 'duration': 113.85, 'highlights': ["The content-based approach is limited in recommending items outside a user's past preferences, which may lead to missing interesting items for the user.", 'Another limitation is the inability of the content-based approach to exploit the quality judgments of other users, potentially causing popular items not preferred by the user to be overlooked.', 'The cold start problem for new users is a significant challenge as there is no user profile to base recommendations on until the user rates more items over time.']}], 'duration': 337.582, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/2uxXPzm-7FY/pics/2uxXPzm-7FY922406.jpg', 'highlights': ['Content-based recommendation approach can make recommendations without needing data about other users, allowing for immediate recommendations for the very first user.', 'Content-based recommendation approach can recommend to users with very unique taste, addressing the limitations of collaborative filtering in finding similar users.', 'Content-based recommendation approach can recommend new and unpopular items without needing ratings from users, solving the first-rater problem in collaborative filtering.', 'Content-based recommendation approach provides explanations for recommendations, allowing for transparency and personalized insights into why certain items are recommended.', 'The main problem with the content-based approach is the difficulty in finding appropriate features, making it a very hard problem. This is the main reason why the approach is not more popular.', "The content-based approach is limited in recommending items outside a user's past preferences, which may lead to missing interesting items for the user.", 'Another limitation is the inability of the content-based approach to exploit the quality judgments of other users, potentially causing popular items not preferred by the user to be overlooked.', 'The cold start problem for new users is a significant challenge as there is no user profile to base recommendations on until the user rates more items over time.']}], 'highlights': ['Content-based recommendation systems recommend items based on previous highly-rated items and similar attributes.', 'The systems aim to provide personalized recommendations by analyzing attributes and preferences of the customer.', 'Inferencing a user profile from the item profiles by identifying the common features liked by the user, such as the color red, circles, and triangles, then using this profile to recommend other relevant items from the catalog.', 'The process involves finding a set of items the user likes using explicit and implicit data, such as highly rated items and purchased items, then building item profiles for each of those items (e.g., description of the item, features about the item).', 'The importance of IDF in giving lower weight to more common words and higher weight to rarer words is emphasized.', "Weighted average item profiles can be formed by assigning weights equal to the user's rating for each item, allowing for a more personalized representation of user preferences.", 'Content-based recommendation approach can make recommendations without needing data about other users, allowing for immediate recommendations for the very first user.', 'Content-based recommendation approach provides explanations for recommendations, allowing for transparency and personalized insights into why certain items are recommended.', 'The main problem with the content-based approach is the difficulty in finding appropriate features, making it a very hard problem. This is the main reason why the approach is not more popular.', "The content-based approach is limited in recommending items outside a user's past preferences, which may lead to missing interesting items for the user."]}