title
Stanford CS25: V3 I Retrieval Augmented Language Models
description
December 5, 2023
Douwe Kiela, Contextual AI
Language models have led to amazing progress, but they also have important shortcomings. One solution for many of these shortcomings is retrieval augmentation. I will introduce the topic, survey recent literature on retrieval augmented language models and finish with some of the main open questions.
More about the course can be found here: https://web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM
detail
{'title': 'Stanford CS25: V3 I Retrieval Augmented Language Models', 'heatmap': [], 'summary': 'Covers various chapters including a deep dive into language model evolution, ai system training and testing, vector database efficiency, challenges with frozen rag architecture, document encoder efficiency, transformer and language model retrieval, language model retrieval augmentation, and the future of rag and multimodality, offering comprehensive insights into current trends and advancements in retrieval augmented language models.', 'chapters': [{'end': 82.165, 'segs': [{'end': 55.873, 'src': 'embed', 'start': 5.794, 'weight': 0, 'content': [{'end': 10.438, 'text': 'Hey guys, welcome to our last lecture of this quarter.', 'start': 5.794, 'duration': 4.644}, {'end': 14.841, 'text': "And we're very happy to have Dawa here.", 'start': 11.578, 'duration': 3.263}, {'end': 24.648, 'text': "He's the CEO of Contextual AI, the enterprise LLM company, as well as an adjunct professor in symbolic systems here at Stanford.", 'start': 15.381, 'duration': 9.267}, {'end': 32.374, 'text': 'And previously, he was the head of research at ClickingBase, and before that, a research scientist at Facebook AI Research.', 'start': 24.988, 'duration': 7.386}, {'end': 39.135, 'text': "He received his PhD and master's from the University of Cambridge, as well as a master's in logic from the University of Amsterdam,", 'start': 32.433, 'duration': 6.702}, {'end': 41.998, 'text': 'and studied philosophy and cognitive AI in undergrad.', 'start': 39.135, 'duration': 2.863}, {'end': 46.643, 'text': 'His work focuses on machine learning as well as NLP,', 'start': 42.979, 'duration': 3.664}, {'end': 53.351, 'text': 'specifically on developing better models for language understanding and generation and better tools for evaluation.', 'start': 46.643, 'duration': 6.708}, {'end': 55.873, 'text': 'and Yeah, give it up for Adele.', 'start': 53.351, 'duration': 2.522}], 'summary': 'Dawa, ceo of contextual ai, is an adjunct professor at stanford with a strong research background, focusing on machine learning and nlp.', 'duration': 50.079, 'max_score': 5.794, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg5794.jpg'}], 'start': 5.794, 'title': 'Dawa: ceo of contextual ai and ai researcher', 'summary': 'Introduces dawa, ceo of contextual ai, an ai researcher with a strong academic background, specializing in machine learning and nlp, and having experience at prominent tech companies.', 'chapters': [{'end': 82.165, 'start': 5.794, 'title': 'Dawa: ceo of contextual ai and ai researcher', 'summary': 'Introduces dawa, the ceo of contextual ai and an ai researcher, who has a strong academic background, focusing on machine learning and nlp with experience at prominent tech companies.', 'duration': 76.371, 'highlights': ['Dawa is the CEO of Contextual AI, an enterprise LLM company, and an adjunct professor in symbolic systems at Stanford, with previous roles at ClickingBase and Facebook AI Research.', "He holds a PhD and master's from the University of Cambridge, a master's in logic from the University of Amsterdam, and has studied philosophy and cognitive AI in undergrad.", 'His work centers around machine learning and NLP, emphasizing the development of better models for language understanding and generation, as well as improved tools for evaluation.']}], 'duration': 76.371, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg5794.jpg', 'highlights': ['Dawa is the CEO of Contextual AI, an enterprise LLM company, and an adjunct professor in symbolic systems at Stanford, with previous roles at ClickingBase and Facebook AI Research.', "He holds a PhD and master's from the University of Cambridge, a master's in logic from the University of Amsterdam, and has studied philosophy and cognitive AI in undergrad.", 'His work centers around machine learning and NLP, emphasizing the development of better models for language understanding and generation, as well as improved tools for evaluation.']}, {'end': 607.701, 'segs': [{'end': 175.289, 'src': 'embed', 'start': 134.72, 'weight': 0, 'content': [{'end': 136.361, 'text': "It's actually several decades old.", 'start': 134.72, 'duration': 1.641}, {'end': 142.224, 'text': "So I'm bringing this up because I was talking to someone, and they were like, OpenAI invented language models.", 'start': 137.942, 'duration': 4.282}, {'end': 144.024, 'text': "And I was like you're kidding me right?", 'start': 142.304, 'duration': 1.72}, {'end': 151.25, 'text': 'so, um, I, I went back to the literature, and this is the oldest one I could find, actually 1991.', 'start': 145.545, 'duration': 5.705}, {'end': 153.252, 'text': 'first neural language model.', 'start': 151.25, 'duration': 2.002}, {'end': 161.76, 'text': "um, there's a very nice paper from 2003, from Benjio, um, where they they actually have like word embeddings and everything already in there.", 'start': 153.252, 'duration': 8.508}, {'end': 171.528, 'text': 'uh, so obviously these are lms, not llms, and as it turns out, if you make them really big and you parameterize them with these massive neural nets,', 'start': 161.76, 'duration': 9.768}, {'end': 175.289, 'text': 'then you get something really powerful that really shows emergent properties.', 'start': 171.528, 'duration': 3.761}], 'summary': 'Neural language models date back to 1991, with significant advancements in 2003.', 'duration': 40.569, 'max_score': 134.72, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg134720.jpg'}, {'end': 223.676, 'src': 'embed', 'start': 196.597, 'weight': 2, 'content': [{'end': 203.281, 'text': "And so that's why it was so easy to come up with this in 1991 already, because it's like the idea is very intuitive.", 'start': 196.597, 'duration': 6.684}, {'end': 208.104, 'text': 'But for a long time, what was really broken with this was the user interface.', 'start': 203.882, 'duration': 4.222}, {'end': 214.229, 'text': 'And this, I think a lot of people kind of misunderstand what ChatGPT was about.', 'start': 209.025, 'duration': 5.204}, {'end': 216.41, 'text': "That's really what ChatGPT fixed.", 'start': 214.989, 'duration': 1.421}, {'end': 223.676, 'text': 'So that initially you had to come up with these very weird prompts in order to get your language model to do what you wanted it to do.', 'start': 217.111, 'duration': 6.565}], 'summary': 'Chatgpt was created in 1991, addressing user interface issues and fixing the need for strange prompts.', 'duration': 27.079, 'max_score': 196.597, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg196597.jpg'}, {'end': 314.084, 'src': 'embed', 'start': 281.143, 'weight': 1, 'content': [{'end': 285.007, 'text': 'So right now language models are kind of taking the world by storm.', 'start': 281.143, 'duration': 3.864}, {'end': 291.814, 'text': 'But if you talk to anyone, especially in an enterprise, for example, where they have very strict accuracy requirements,', 'start': 285.047, 'duration': 6.767}, {'end': 294.516, 'text': "they will tell you that they can't really productionize this yet.", 'start': 291.814, 'duration': 2.702}, {'end': 302.288, 'text': 'And the reason is because there are all these familiar problems. probably a bunch of you are working on these problems right now around hallucination.', 'start': 295.598, 'duration': 6.69}, {'end': 309.799, 'text': 'So these models, they kind of make up stuff very often with very high confidence, which is even more scary in a way.', 'start': 303.389, 'duration': 6.41}, {'end': 314.084, 'text': "Attributions that we don't really know why these models are saying what they're saying.", 'start': 310.66, 'duration': 3.424}], 'summary': 'Language models face accuracy and attribution challenges, hindering productionization.', 'duration': 32.941, 'max_score': 281.143, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg281143.jpg'}, {'end': 402.441, 'src': 'embed', 'start': 373.284, 'weight': 4, 'content': [{'end': 375.186, 'text': "so that's really just rag right.", 'start': 373.284, 'duration': 1.902}, {'end': 377.107, 'text': 'so we can.', 'start': 375.186, 'duration': 1.921}, {'end': 385.752, 'text': 'this whole lecture is basically about drag, but the way to understand what is going on here is We have this generator, just like before.', 'start': 377.107, 'duration': 8.645}, {'end': 392.596, 'text': 'we have the input and a prompt, just like before, but now, instead of just giving those two things, we give this additional context.', 'start': 385.752, 'duration': 6.844}, {'end': 396.298, 'text': "So we contextualize the language model using things we've retrieved.", 'start': 392.636, 'duration': 3.662}, {'end': 400.5, 'text': 'And the retriever is very often pretty simple.', 'start': 397.479, 'duration': 3.021}, {'end': 402.441, 'text': "It's just a query in the document encoder.", 'start': 400.6, 'duration': 1.841}], 'summary': 'Lecture focuses on contextualizing language model using retrieved information.', 'duration': 29.157, 'max_score': 373.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg373284.jpg'}, {'end': 485.844, 'src': 'embed', 'start': 458.658, 'weight': 6, 'content': [{'end': 463.62, 'text': "So what we're doing with RAG is we're adding this non-parametric retrieval component.", 'start': 458.658, 'duration': 4.962}, {'end': 469.082, 'text': 'So you might call this semi-parametric if you want to give this a name.', 'start': 464.16, 'duration': 4.922}, {'end': 470.803, 'text': 'All right.', 'start': 470.583, 'duration': 0.22}, {'end': 473.784, 'text': 'so why does that actually solve these issues?', 'start': 470.803, 'duration': 2.981}, {'end': 482.381, 'text': 'And so the answer is basically that if you have this separate index right, this separate retriever, you can swap it in, you can swap it out,', 'start': 474.935, 'duration': 7.446}, {'end': 485.844, 'text': 'you can replace it with a new index so you can really customize it.', 'start': 482.381, 'duration': 3.463}], 'summary': 'Rag is adding a non-parametric retrieval component to customize and solve issues.', 'duration': 27.186, 'max_score': 458.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg458658.jpg'}], 'start': 82.905, 'title': 'Language model evolution and customization', 'summary': 'Delves into the evolution of language models, addressing misconceptions, emergence of powerful models, and challenges in productionizing. it also covers the need for model customization, challenges in adapting to different use cases, and the use of rag for improved performance and grounding.', 'chapters': [{'end': 314.084, 'start': 82.905, 'title': 'Retrieval augmentation in language models', 'summary': 'Discusses the evolution of language models, highlighting the misconception about their invention, the emergence of powerful language models, the importance of fixing the user interface, and the challenges around productionizing language models due to issues like hallucination and unknown attributions.', 'duration': 231.179, 'highlights': ['The misconception about the invention of language models is addressed, elucidating that it is not a recent idea and was introduced several decades ago.', 'The emergence of powerful language models through parameterizing massive neural nets is highlighted, showcasing their significant emergent properties and the excitement surrounding them.', 'The importance of fixing the user interface for language models, as demonstrated by ChatGPT, is emphasized to provide a more intuitive and user-friendly interaction with the models.', 'The challenges around productionizing language models, particularly related to issues like hallucination and unknown attributions, are underscored, especially in enterprise settings with strict accuracy requirements.']}, {'end': 607.701, 'start': 314.865, 'title': 'Customizing language models with rag', 'summary': 'Discusses the need for language models to stay up-to-date, the challenges of customizing models for different use cases, and the use of rag to incorporate external context for better performance and grounding, addressing issues such as staleness, customization, and grounding.', 'duration': 292.836, 'highlights': ['The need for language models to remain up-to-date and never go stale, and the challenges of model editing and customization for different use cases and data.', 'The use of RAG to contextualize language models by incorporating external context for better performance and grounding, addressing issues such as staleness, customization, and grounding.', 'The distinction between parametric and semi-parametric approaches in language model systems, and the benefits of RAG in allowing customization and updates to the index for better performance and grounding.']}], 'duration': 524.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg82905.jpg', 'highlights': ['The emergence of powerful language models through parameterizing massive neural nets', 'The challenges around productionizing language models, particularly related to issues like hallucination and unknown attributions', 'The importance of fixing the user interface for language models, as demonstrated by ChatGPT', 'The need for language models to remain up-to-date and never go stale, and the challenges of model editing and customization for different use cases and data', 'The use of RAG to contextualize language models by incorporating external context for better performance and grounding', 'The misconception about the invention of language models is addressed, elucidating that it is not a recent idea and was introduced several decades ago', 'The distinction between parametric and semi-parametric approaches in language model systems, and the benefits of RAG in allowing customization and updates to the index for better performance and grounding']}, {'end': 1000.042, 'segs': [{'end': 636.796, 'src': 'embed', 'start': 608.815, 'weight': 1, 'content': [{'end': 613.419, 'text': "it's useful for you to think about what happens during training time and what happens during test time.", 'start': 608.815, 'duration': 4.604}, {'end': 617.922, 'text': "So during training time, it's really, OK, we have this language model.", 'start': 614.359, 'duration': 3.563}, {'end': 619.163, 'text': 'We have this retriever.', 'start': 617.942, 'duration': 1.221}, {'end': 621.725, 'text': 'Which one do we update??', 'start': 620.604, 'duration': 1.121}, {'end': 622.886, 'text': 'How do we update them??', 'start': 621.765, 'duration': 1.121}, {'end': 624.727, 'text': 'How do we train this entire system??', 'start': 622.966, 'duration': 1.761}, {'end': 626.288, 'text': 'Do we maybe not train it at all??', 'start': 624.787, 'duration': 1.501}, {'end': 628.45, 'text': 'Do we pre-train it from scratch??', 'start': 627.029, 'duration': 1.421}, {'end': 632.593, 'text': 'Do we initialize it with components that were already separately trained??', 'start': 628.49, 'duration': 4.103}, {'end': 636.796, 'text': 'These are the kinds of questions that you have to answer if you want to design a system like this.', 'start': 633.054, 'duration': 3.742}], 'summary': 'Designing a system involves training language model and retriever, making decisions on updating and training process.', 'duration': 27.981, 'max_score': 608.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg608815.jpg'}, {'end': 678.703, 'src': 'embed', 'start': 649.729, 'weight': 3, 'content': [{'end': 655.195, 'text': "So give it different indices during test time or manipulate kind of how you're sampling things like that.", 'start': 649.729, 'duration': 5.466}, {'end': 663.56, 'text': 'So the starting point for all of this stuff, I think if you ask someone now, like, what is RAG, they will think of this thing.', 'start': 656.638, 'duration': 6.922}, {'end': 666.88, 'text': 'So this is frozen RAG, basically.', 'start': 664.6, 'duration': 2.28}, {'end': 669.341, 'text': "There's no training here at all.", 'start': 668.221, 'duration': 1.12}, {'end': 673.282, 'text': "So going back to this question of train time, test time, there's only test time here.", 'start': 669.661, 'duration': 3.621}, {'end': 678.703, 'text': "Train time happens separately with these kind of black box models that we don't necessarily have control over.", 'start': 673.322, 'duration': 5.381}], 'summary': 'Rag is a frozen model with no training, only test time, and separate train time with black box models.', 'duration': 28.974, 'max_score': 649.729, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg649729.jpg'}, {'end': 753.363, 'src': 'embed', 'start': 725.191, 'weight': 4, 'content': [{'end': 732.476, 'text': 'OK, if we want to outperform this frozen thing itself with just the vector database, what would that look like from a retrieval perspective?', 'start': 725.191, 'duration': 7.285}, {'end': 737.809, 'text': 'And the starting point for everything retrieval is TF-IDF.', 'start': 734.226, 'duration': 3.583}, {'end': 743.274, 'text': 'Does everybody know what TF-IDF is? No? Okay.', 'start': 738.05, 'duration': 5.224}, {'end': 753.363, 'text': 'So TF-IDF is basically a sparse retrieval method where you have a score function that looks at documents and queries, so D and Q.', 'start': 743.534, 'duration': 9.829}], 'summary': 'Using tf-idf for retrieval, aiming to outperform frozen vector database.', 'duration': 28.172, 'max_score': 725.191, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg725191.jpg'}, {'end': 976.037, 'src': 'embed', 'start': 929.023, 'weight': 0, 'content': [{'end': 937.778, 'text': "so uh, if they're synonyms, you can still find the relevant document, which you couldn't really do with a sparse representation, right.", 'start': 929.023, 'duration': 8.755}, {'end': 941.686, 'text': "so that's really the advantage of dense is that you get like semantic similarity.", 'start': 937.778, 'duration': 3.908}, {'end': 946.001, 'text': 'So you can do this over word embeddings.', 'start': 943.459, 'duration': 2.542}, {'end': 951.385, 'text': "That doesn't really work all that well, but at the time that people started thinking about this, BERT was already out there.", 'start': 946.021, 'duration': 5.364}, {'end': 955.968, 'text': 'And BERT is really great for giving you a vector representation for an entire sequence of words.', 'start': 951.425, 'duration': 4.543}, {'end': 958.91, 'text': 'So, a sentence representation or a passage representation.', 'start': 956.368, 'duration': 2.542}, {'end': 964.374, 'text': 'So, there are all these cool systems like ORCA and DPR, the dense passage retriever.', 'start': 959.61, 'duration': 4.764}, {'end': 971.055, 'text': 'where they essentially use the retrieval as a kind of latent variable in the system.', 'start': 965.274, 'duration': 5.781}, {'end': 976.037, 'text': 'And the way to get the latent variable to work to be good enough.', 'start': 971.616, 'duration': 4.421}], 'summary': 'Dense representations provide semantic similarity, utilized by systems like bert and orca for effective document retrieval.', 'duration': 47.014, 'max_score': 929.023, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg929023.jpg'}], 'start': 608.815, 'title': 'Ai systems and document retrieval', 'summary': 'Covers considerations during ai system training and testing, including frozen rag concept, and the evolution of document retrieval methods from tf-idf to dense models like bert, orca, and dpr.', 'chapters': [{'end': 673.282, 'start': 608.815, 'title': 'Designing and testing ai systems', 'summary': 'Discusses the considerations during training and test time for an ai system, including the questions of how to update and train the system, as well as the potential manipulations during test time. it also introduces the concept of frozen rag, representing a system without any training.', 'duration': 64.467, 'highlights': ['During training time, important considerations include deciding which components of the system to update, how to update them, and whether to pre-train the system from scratch or initialize it with separately trained components.', 'During test time, there are opportunities to manipulate the system, such as giving different indices or altering the sampling process.', 'The concept of frozen RAG represents a system without any training, functioning solely during test time.']}, {'end': 1000.042, 'start': 673.322, 'title': 'Improving document retrieval with dense models', 'summary': 'Discusses the evolution of document retrieval methods, starting from sparse retrieval like tf-idf to the advantages of dense retrieval using word embeddings and models like bert, orca, and dpr for latent variable retrieval.', 'duration': 326.72, 'highlights': ['The advantages of dense retrieval over sparse retrieval include semantic similarity and the ability to find relevant documents based on synonyms, as demonstrated by models like BERT, ORCA, and DPR.', 'TF-IDF is a sparse retrieval method that uses term frequency and inverse document frequency to score document-query overlap, with parameters like BM25 for better scoring, and is widely used in systems like dr qa for open domain question answering.', 'The use of dense retrieval models like BERT, ORCA, and DPR as latent variables in the system, and the importance of pre-training the retriever on relevant information for effective system training.']}], 'duration': 391.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg608815.jpg', 'highlights': ['The advantages of dense retrieval over sparse retrieval include semantic similarity and the ability to find relevant documents based on synonyms, as demonstrated by models like BERT, ORCA, and DPR.', 'During training time, important considerations include deciding which components of the system to update, how to update them, and whether to pre-train the system from scratch or initialize it with separately trained components.', 'The use of dense retrieval models like BERT, ORCA, and DPR as latent variables in the system, and the importance of pre-training the retriever on relevant information for effective system training.', 'During test time, there are opportunities to manipulate the system, such as giving different indices or altering the sampling process.', 'TF-IDF is a sparse retrieval method that uses term frequency and inverse document frequency to score document-query overlap, with parameters like BM25 for better scoring, and is widely used in systems like dr qa for open domain question answering.', 'The concept of frozen RAG represents a system without any training, functioning solely during test time.']}, {'end': 1889.645, 'segs': [{'end': 1080.486, 'src': 'embed', 'start': 1046.144, 'weight': 0, 'content': [{'end': 1050.366, 'text': "So all the popular ones, they're sort of re-implementations of this FACE idea.", 'start': 1046.144, 'duration': 4.222}, {'end': 1053.608, 'text': "One is in Rust, one is in Go, but it's all basically the same idea.", 'start': 1050.386, 'duration': 3.222}, {'end': 1054.649, 'text': "It's just FACE.", 'start': 1053.628, 'duration': 1.021}, {'end': 1058.771, 'text': 'And so FACE really powers a lot of this stuff.', 'start': 1056.25, 'duration': 2.521}, {'end': 1065.455, 'text': 'And whenever somebody tells you something about a vector database, just think about FACE, very fast dot product.', 'start': 1059.792, 'duration': 5.663}, {'end': 1071.218, 'text': 'So obviously, you can go beyond dot product, yes.', 'start': 1068.216, 'duration': 3.002}, {'end': 1080.486, 'text': "What is face? So it's an open source library Facebook AI similarity search.", 'start': 1073.635, 'duration': 6.851}], 'summary': 'Popular implementations in rust and go are all re-implementations of the face idea, which powers a lot of vector database applications for very fast dot product similarity search.', 'duration': 34.342, 'max_score': 1046.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1046144.jpg'}, {'end': 1162.035, 'src': 'embed', 'start': 1135.921, 'weight': 2, 'content': [{'end': 1140.663, 'text': 'So you have two different BERT models or whatever your encoder is here.', 'start': 1135.921, 'duration': 4.742}, {'end': 1142.344, 'text': 'And then at the end, you get these two vectors.', 'start': 1140.703, 'duration': 1.641}, {'end': 1143.924, 'text': 'And then you just do dot products.', 'start': 1142.504, 'duration': 1.42}, {'end': 1145.225, 'text': 'So you get one single score.', 'start': 1143.944, 'duration': 1.281}, {'end': 1153.11, 'text': "But you can do all kinds of much fancier things if you're willing to give up on this bi-encoder approach.", 'start': 1145.785, 'duration': 7.325}, {'end': 1158.053, 'text': 'So a really nice example from one of your colleagues here at Stanford is Colbert.', 'start': 1153.41, 'duration': 4.643}, {'end': 1162.035, 'text': 'So what this does is late interaction.', 'start': 1159.513, 'duration': 2.522}], 'summary': 'Using two bert models to produce two vectors, then calculate dot products to get a single score. can explore more advanced methods with colbert for late interaction.', 'duration': 26.114, 'max_score': 1135.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1135921.jpg'}, {'end': 1226.719, 'src': 'embed', 'start': 1183.591, 'weight': 1, 'content': [{'end': 1186.474, 'text': 'But just just so you know, if you run into it.', 'start': 1183.591, 'duration': 2.883}, {'end': 1194.862, 'text': 'Um, so, um but, but I think if we look at kind of where the state of the artist has been going now,', 'start': 1187.717, 'duration': 7.145}, {'end': 1198.944, 'text': "1 of the nice things about these vector databases is that they're super efficient right?", 'start': 1194.862, 'duration': 4.082}, {'end': 1202.186, 'text': 'So adult product is much more efficient than this late interaction stuff.', 'start': 1198.964, 'duration': 3.222}, {'end': 1204.828, 'text': 'Especially if you do the approximate nearest neighbor search.', 'start': 1202.226, 'duration': 2.602}, {'end': 1207.489, 'text': "But there's been some really cool work.", 'start': 1206.008, 'duration': 1.481}, {'end': 1211.232, 'text': 'So, things like splayed, uh, basically.', 'start': 1208.05, 'duration': 3.182}, {'end': 1214.593, 'text': 'have sparse meet dense in a way.', 'start': 1212.012, 'duration': 2.581}, {'end': 1219.456, 'text': "so one of the big problems, as i said, with sparse is that you can't really handle synonyms and things like that.", 'start': 1214.593, 'duration': 4.863}, {'end': 1223.398, 'text': 'but what you could do is take a dense model like a bird model.', 'start': 1219.456, 'duration': 3.942}, {'end': 1226.719, 'text': 'look at kind of this, this one word in your sequence.', 'start': 1223.398, 'duration': 3.321}], 'summary': 'Vector databases offer super efficiency, especially in approximate nearest neighbor search, addressing the challenge of sparse data and handling synonyms.', 'duration': 43.128, 'max_score': 1183.591, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1183591.jpg'}, {'end': 1311.773, 'src': 'embed', 'start': 1286.136, 'weight': 7, 'content': [{'end': 1292.121, 'text': 'uh, retrieval in general is that, is that what we see happening right now, if you look at sort of the developer community around drag,', 'start': 1286.136, 'duration': 5.985}, {'end': 1294.763, 'text': "is that they're all doing hybrid search right now.", 'start': 1292.121, 'duration': 2.642}, {'end': 1301.328, 'text': 'uh, so you can actually just combine the search results from your sparse bn25 or whatever thing or slate,', 'start': 1294.763, 'duration': 6.565}, {'end': 1306.631, 'text': 'and you can combine them with your dragon And then you get this ranking that works even better.', 'start': 1301.328, 'duration': 5.303}, {'end': 1311.773, 'text': 'So then you kind of get best of both worlds, but then you get all these questions about how do you combine the results.', 'start': 1307.212, 'duration': 4.561}], 'summary': 'Developers are currently employing hybrid search, combining search results from different sources for improved ranking.', 'duration': 25.637, 'max_score': 1286.136, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1286136.jpg'}, {'end': 1384.336, 'src': 'embed', 'start': 1331.607, 'weight': 4, 'content': [{'end': 1336.412, 'text': 'directly asking the large language model the question has there been any benchmarking studies in this?', 'start': 1331.607, 'duration': 4.805}, {'end': 1343.692, 'text': "Yeah, so there's a great paper, if I can say so myself, on the fact that retrieval augmentation reduces hallucination.", 'start': 1337.509, 'duration': 6.183}, {'end': 1345.953, 'text': "It's from 2021, I think.", 'start': 1344.392, 'duration': 1.561}, {'end': 1353.136, 'text': "So yeah, you can just find, if you literally look for retrieval augmentation reduces hallucination, then you'll find the paper.", 'start': 1347.013, 'duration': 6.123}, {'end': 1354.597, 'text': 'Oh, thank you.', 'start': 1354.116, 'duration': 0.481}, {'end': 1359.659, 'text': "This is, we'll see.", 'start': 1358.238, 'duration': 1.421}, {'end': 1362.28, 'text': 'this is a picture of the dense approach.', 'start': 1359.659, 'duration': 2.621}, {'end': 1363.661, 'text': 'and why do we need swabs?', 'start': 1362.28, 'duration': 1.381}, {'end': 1365.947, 'text': 'Yeah, so,', 'start': 1364.266, 'duration': 1.681}, {'end': 1375.071, 'text': "very often you want to have a very precise word overlap for things where you don't want to have the synonyms or the kind of nearest neighbors.", 'start': 1365.947, 'duration': 9.124}, {'end': 1382.235, 'text': "So if there's like a brand name or something like that, then like let's say the brand is Apple.", 'start': 1375.312, 'duration': 6.923}, {'end': 1384.336, 'text': "You don't want to find stuff about pears.", 'start': 1382.615, 'duration': 1.721}], 'summary': 'Retrieval augmentation reduces hallucination according to a 2021 paper.', 'duration': 52.729, 'max_score': 1331.607, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1331607.jpg'}, {'end': 1714.977, 'src': 'embed', 'start': 1691.418, 'weight': 5, 'content': [{'end': 1702.988, 'text': "So we're slowly progressing towards having a system that is much more optimized for being properly retrieval augmented in a way where it's useful and contextualized for what you want to use it for.", 'start': 1691.418, 'duration': 11.57}, {'end': 1708.332, 'text': 'So yeah, just to point out kind of what that looks like with this re-ranker.', 'start': 1704.989, 'duration': 3.343}, {'end': 1714.977, 'text': 'So you just have this extra step essentially, right? So we have our retriever, then we have a re-ranker, then we have our generator and our output.', 'start': 1708.372, 'duration': 6.605}], 'summary': 'Progressing towards an optimized retrieval system with a re-ranker and generator.', 'duration': 23.559, 'max_score': 1691.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1691418.jpg'}, {'end': 1833.589, 'src': 'embed', 'start': 1803.971, 'weight': 6, 'content': [{'end': 1807.233, 'text': 'uh, so instead of that, we have everything much closer and learning together.', 'start': 1803.971, 'duration': 3.262}, {'end': 1817.46, 'text': 'um, so, um, one of the the first uh ways of doing this with the generator uh was rag retrieval, augmented generation, uh,', 'start': 1807.233, 'duration': 10.227}, {'end': 1818.541, 'text': 'which we did at fair in 2020.', 'start': 1817.46, 'duration': 1.081}, {'end': 1823.742, 'text': "um, and It's very similar to what we've already seen.", 'start': 1818.541, 'duration': 5.201}, {'end': 1827.405, 'text': 'We basically have this retriever here that works over different documents.', 'start': 1823.863, 'duration': 3.542}, {'end': 1833.589, 'text': 'You get some score function that gets given to this generator that generates the answer.', 'start': 1827.445, 'duration': 6.144}], 'summary': 'Implemented rag retrieval and augmented generation at fair in 2020.', 'duration': 29.618, 'max_score': 1803.971, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1803971.jpg'}], 'start': 1000.042, 'title': 'Vector database and retrieval techniques', 'summary': 'Discusses the efficiency of dot products in vector databases, mentioning the face library, siamese network, late interaction, splayed, and dragon, and the trend towards hybrid search, with a mention of a paper on retrieval augmentation reducing hallucination. it also covers the use of dense retrievers in information retrieval systems, including approaches to contextualize the retriever for the generator, leading to improved system optimization.', 'chapters': [{'end': 1354.597, 'start': 1000.042, 'title': 'Vector database and retrieval techniques', 'summary': 'Discusses the efficiency of dot products in vector databases, mentioning the face library, siamese network, late interaction, splayed, and dragon, and the trend towards hybrid search, with a mention of a paper on retrieval augmentation reducing hallucination.', 'duration': 354.555, 'highlights': ['The FACE library powers modern vector databases and enables very fast dot product computation, contributing to efficient retrieval techniques.', 'The Siamese network involves using two different BERT models and computing a dot product to obtain a single score, while late interaction and Dragon introduce more complex scoring methods for retrieval.', 'Splayed addresses the issue of handling synonyms in sparse models by combining sparse and dense representations for more efficient search, while Dragon is highlighted as a recommended dense retriever with progressive data augmentation.', 'The developer community is currently engaging in hybrid search, combining results from different retrieval techniques to achieve improved rankings.', 'A paper from 2021 discusses the reduction of hallucination through retrieval augmentation, providing a benchmark for evaluating the impact on closed book question answering.', 'The efficiency of dot products in vector databases and the trend towards hybrid search are emphasized as key points in the chapter.']}, {'end': 1889.645, 'start': 1358.238, 'title': 'Retrieval augmented generation', 'summary': 'Discusses the use of dense retrievers in information retrieval systems, including approaches to contextualize the retriever for the generator, such as the use of a dense retriever for document retrieval and a re-ranker to backprop into a bert model, leading to improved system optimization.', 'duration': 531.407, 'highlights': ["The use of dense retrievers for document retrieval and the importance of precise word overlap for things like brand names, as exemplified by the case of the brand 'Apple' and avoiding retrieval of unrelated information about 'pears'.", 'The concept of contextualizing the retriever for the generator, as demonstrated by the approach of using a dense retriever for document retrieval and a re-ranker to backprop into a BERT model, resulting in a more optimized system for contextualized use.', 'The method of using a dense retriever for document retrieval and computing the likelihood of retrieved documents, followed by supplying each retrieved document separately to a language model to minimize the KL divergence and retrieve documents leading to the lowest perplexity on the right answer for the language model.', 'The discussion on methods for optimizing both the retriever and the generator, leading to a more cohesive and contextualized architecture where everything works together, as exemplified by the RAG (Retrieval Augmented Generation) model and its approach to updating both the retriever and the generator.']}], 'duration': 889.603, 'thumbnail': '', 'highlights': ['The FACE library powers modern vector databases for efficient retrieval techniques.', 'The efficiency of dot products in vector databases and the trend towards hybrid search are emphasized as key points in the chapter.', 'The Siamese network involves using two different BERT models and computing a dot product to obtain a single score, while late interaction and Dragon introduce more complex scoring methods for retrieval.', 'Splayed addresses the issue of handling synonyms in sparse models by combining sparse and dense representations for more efficient search, while Dragon is highlighted as a recommended dense retriever with progressive data augmentation.', "The use of dense retrievers for document retrieval and the importance of precise word overlap for things like brand names, as exemplified by the case of the brand 'Apple' and avoiding retrieval of unrelated information about 'pears'.", 'The concept of contextualizing the retriever for the generator, as demonstrated by the approach of using a dense retriever for document retrieval and a re-ranker to backprop into a BERT model, resulting in a more optimized system for contextualized use.', 'The discussion on methods for optimizing both the retriever and the generator, leading to a more cohesive and contextualized architecture where everything works together, as exemplified by the RAG (Retrieval Augmented Generation) model and its approach to updating both the retriever and the generator.', 'The developer community is currently engaging in hybrid search, combining results from different retrieval techniques to achieve improved rankings.', 'A paper from 2021 discusses the reduction of hallucination through retrieval augmentation, providing a benchmark for evaluating the impact on closed book question answering.']}, {'end': 2221.395, 'segs': [{'end': 1925.044, 'src': 'embed', 'start': 1889.645, 'weight': 3, 'content': [{'end': 1892.167, 'text': 'um and and what we do in this paper?', 'start': 1889.645, 'duration': 2.522}, {'end': 1896.431, 'text': "basically the whole point of the paper is that this frozen thing doesn't really work all that well.", 'start': 1892.167, 'duration': 4.264}, {'end': 1903.634, 'text': 'So I think what people call RAG now usually refers to the frozen thing,', 'start': 1897.411, 'duration': 6.223}, {'end': 1907.976, 'text': 'but the whole paper basically would never have been accepted anywhere if we had just done the frozen thing.', 'start': 1903.634, 'duration': 4.342}, {'end': 1917.06, 'text': "The whole point of the paper is that you want to optimize it, and so at my company contextual, we call this frozen thing frankenstein's monster,", 'start': 1908.236, 'duration': 8.824}, {'end': 1921.082, 'text': "because it's really like you cobble together these different pieces, right, you sort of.", 'start': 1917.06, 'duration': 4.022}, {'end': 1922.623, 'text': "yeah, it's really like frankenstein.", 'start': 1921.082, 'duration': 1.541}, {'end': 1925.044, 'text': 'you just put it together and then it sort of walks.', 'start': 1922.623, 'duration': 2.421}], 'summary': "The paper argues that the 'frozen thing' (rag) doesn't work well and proposes optimization, termed 'frankenstein's monster'.", 'duration': 35.399, 'max_score': 1889.645, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1889645.jpg'}, {'end': 1997.345, 'src': 'embed', 'start': 1971.646, 'weight': 0, 'content': [{'end': 1980.388, 'text': 'so this is fid, fusion and decoder and, as you can see, this scales to a much higher number of passages,', 'start': 1971.646, 'duration': 8.742}, {'end': 1985.269, 'text': 'and that leads to corresponding improvements in the scores that you care about.', 'start': 1980.388, 'duration': 4.881}, {'end': 1988.516, 'text': "So that's a really cool idea.", 'start': 1987.335, 'duration': 1.181}, {'end': 1993.341, 'text': "And so we're slowly moving towards more decoder-only architectures.", 'start': 1988.596, 'duration': 4.745}, {'end': 1995.282, 'text': 'So in RAG, we have this bar model.', 'start': 1993.581, 'duration': 1.701}, {'end': 1997.345, 'text': "It's sort of an encoder-decoder architecture.", 'start': 1995.322, 'duration': 2.023}], 'summary': 'Fid fusion and decoder scale to higher passages, leading to improved scores. moving towards decoder-only architectures.', 'duration': 25.699, 'max_score': 1971.646, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1971646.jpg'}, {'end': 2133.052, 'src': 'embed', 'start': 2088.052, 'weight': 1, 'content': [{'end': 2096.235, 'text': 'where they showed that you can have a 25 times smaller retrieval augmented language model trained from scratch, so really pre-trained,', 'start': 2088.052, 'duration': 8.183}, {'end': 2097.676, 'text': 'entirely from scratch.', 'start': 2096.235, 'duration': 1.441}, {'end': 2104.438, 'text': 'that outperforms this 25 times bigger language model on the same data in terms of perplexity, which is pretty impressive.', 'start': 2097.676, 'duration': 6.762}, {'end': 2111.701, 'text': 'So this architecture is much more efficient than a parametric model because you can rely on this external memory.', 'start': 2105.259, 'duration': 6.442}, {'end': 2116.303, 'text': 'So if your external memory is big enough, you can get pretty huge gains.', 'start': 2112.081, 'duration': 4.222}, {'end': 2122.524, 'text': "So there was a lot of excitement about Retro when it was announced, but it's a deep mind paper.", 'start': 2117.54, 'duration': 4.984}, {'end': 2127.327, 'text': "So there's really no open source, nothing really to validate that this actually works.", 'start': 2122.564, 'duration': 4.763}, {'end': 2133.052, 'text': 'And so very recently, there has been a bit of work from NVIDIA called Retro++.', 'start': 2128.729, 'duration': 4.323}], 'summary': 'A 25 times smaller retrieval augmented language model outperforms a 25 times bigger language model in terms of perplexity, showing its efficiency and potential for huge gains.', 'duration': 45, 'max_score': 2088.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2088052.jpg'}], 'start': 1889.645, 'title': 'Challenges with frozen thing and retro architecture enhancements', 'summary': "Discusses the ineffectiveness of the 'frozen thing' and the need for improvement, as well as the limitations of the original rag architecture and its enhancements, including the fusion and decoder approach that scales to a higher number of passages, leading to improvements in scores. it also addresses the recent efficiency enhancements by nvidia and the challenge of reproducing and validating the effectiveness of these models.", 'chapters': [{'end': 1925.044, 'start': 1889.645, 'title': 'Challenges with frozen thing', 'summary': "Discusses the ineffectiveness of the 'frozen thing' in optimization and highlights the need to improve it, referring to it as 'frankenstein's monster' due to its piecemeal nature.", 'duration': 35.399, 'highlights': ["The frozen thing doesn't work well, leading the paper to emphasize the need for optimization and improvement.", "Referring to the frozen thing as 'frankenstein's monster' illustrates its piecemeal nature and the need for a more cohesive solution."]}, {'end': 2221.395, 'start': 1925.044, 'title': 'Retro architecture and its enhancements', 'summary': 'Discusses the limitations of the original rag architecture and presents the fusion and decoder approach which scales to a higher number of passages, leading to corresponding improvements in scores. it also delves into the efficiency of the retro architecture and its recent enhancements by nvidia, highlighting the challenge of reproducing and validating the effectiveness of these models.', 'duration': 296.351, 'highlights': ['The fusion and decoder approach scales to a much higher number of passages, leading to corresponding improvements in the scores (quantifiable data).', 'The Retro architecture, when exploited by the paper called Retro out of DeepMind, resulted in a 25 times smaller retrieval augmented language model trained from scratch, outperforming a 25 times bigger language model on the same data in terms of perplexity (quantifiable data).', 'The recent work from NVIDIA called Retro++ combines the retro architecture with elements of RAG, showing promising results but highlighting the challenge of reproducing and validating the effectiveness of these models.']}], 'duration': 331.75, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg1889645.jpg', 'highlights': ['The fusion and decoder approach scales to a much higher number of passages, leading to corresponding improvements in the scores (quantifiable data)', 'The recent work from NVIDIA called Retro++ combines the retro architecture with elements of RAG, showing promising results but highlighting the challenge of reproducing and validating the effectiveness of these models', 'The Retro architecture, when exploited by the paper called Retro out of DeepMind, resulted in a 25 times smaller retrieval augmented language model trained from scratch, outperforming a 25 times bigger language model on the same data in terms of perplexity (quantifiable data)', "The frozen thing doesn't work well, leading the paper to emphasize the need for optimization and improvement", "Referring to the frozen thing as 'frankenstein's monster' illustrates its piecemeal nature and the need for a more cohesive solution"]}, {'end': 2935.351, 'segs': [{'end': 2275.954, 'src': 'embed', 'start': 2238.606, 'weight': 1, 'content': [{'end': 2244.11, 'text': "And retro kind of showed that it might be possible, but we don't necessarily know exactly how to do it the right way.", 'start': 2238.606, 'duration': 5.504}, {'end': 2247.032, 'text': 'And this is really one of the interesting open questions.', 'start': 2244.43, 'duration': 2.602}, {'end': 2260.737, 'text': 'Any questions on that? Online? No? Okay.', 'start': 2249.267, 'duration': 11.47}, {'end': 2261.878, 'text': "Then we'll move on.", 'start': 2261.298, 'duration': 0.58}, {'end': 2269.573, 'text': "let's go all the way with the contextualization now.", 'start': 2266.932, 'duration': 2.641}, {'end': 2275.954, 'text': 'So with Retro and with RAG, what we actually did is we only updated the query encoder.', 'start': 2269.673, 'duration': 6.281}], 'summary': 'Retro and rag updated only the query encoder', 'duration': 37.348, 'max_score': 2238.606, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2238606.jpg'}, {'end': 2385.167, 'src': 'embed', 'start': 2349.386, 'weight': 0, 'content': [{'end': 2352.648, 'text': 'Well, they need us to make that internet will change.', 'start': 2349.386, 'duration': 3.262}, {'end': 2363.766, 'text': 'change the entire business.', 'start': 2362.072, 'duration': 1.694}, {'end': 2368.342, 'text': "Yeah, that's one way to do it.", 'start': 2367.142, 'duration': 1.2}, {'end': 2372.944, 'text': 'So there are a bunch of different ways to update the document encoder.', 'start': 2368.863, 'duration': 4.081}, {'end': 2377.925, 'text': 'So what they do in Realm is they basically do it for T batches.', 'start': 2373.184, 'duration': 4.741}, {'end': 2382.706, 'text': 'Then they stop, they re-encode the entire internet, and then they train again.', 'start': 2378.685, 'duration': 4.021}, {'end': 2385.167, 'text': "So it's sort of asynchronous updates.", 'start': 2383.547, 'duration': 1.62}], 'summary': 'Realm updates internet in t batches, re-encodes entirely, and trains again for asynchronous updates.', 'duration': 35.781, 'max_score': 2349.386, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2349386.jpg'}, {'end': 2549.138, 'src': 'embed', 'start': 2518.855, 'weight': 2, 'content': [{'end': 2521.877, 'text': 'um, and so this one, i think, is actually quite elegant.', 'start': 2518.855, 'duration': 3.022}, {'end': 2523.918, 'text': 'it does, because that really gets to like.', 'start': 2521.877, 'duration': 2.041}, {'end': 2527.381, 'text': 'how valuable is this one single document for me?', 'start': 2523.918, 'duration': 3.463}, {'end': 2529.702, 'text': 'answering this question correctly?', 'start': 2527.381, 'duration': 2.321}, {'end': 2536.526, 'text': 'um so, uh, they compare all of these different versions and, uh, what you can see is that, uh,', 'start': 2529.702, 'duration': 6.824}, {'end': 2541.95, 'text': 'the the kind of replug style loss and this leave one out loss, they performed a lot better than all of these others.', 'start': 2536.526, 'duration': 5.424}, {'end': 2545.194, 'text': 'So this fixed retriever or no joint pre-training?', 'start': 2542.33, 'duration': 2.864}, {'end': 2549.138, 'text': 'these are really kind of the baseline sort of frozen rag models or closed book.', 'start': 2545.194, 'duration': 3.944}], 'summary': 'Comparing different models, replug style loss and leave one out loss performed better than others.', 'duration': 30.283, 'max_score': 2518.855, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2518855.jpg'}, {'end': 2663.866, 'src': 'embed', 'start': 2634.846, 'weight': 3, 'content': [{'end': 2636.566, 'text': 'And quite surprisingly,', 'start': 2634.846, 'duration': 1.72}, {'end': 2645.191, 'text': 'I think they find that just updating the query so like in the original RAD paper is actually already basically good enough in many cases.', 'start': 2636.566, 'duration': 8.625}, {'end': 2651.396, 'text': "so so that's nice, because it's much more efficient if you don't have to update your documents all the time.", 'start': 2645.191, 'duration': 6.205}, {'end': 2658.382, 'text': 'uh, i think the real question here, though, is like uh, how good is your document representation to begin with?', 'start': 2651.396, 'duration': 6.986}, {'end': 2661.845, 'text': 'so you need to have a very, very high quality embedding model for this to work.', 'start': 2658.382, 'duration': 3.463}, {'end': 2663.866, 'text': "if you don't have that, then this will not work.", 'start': 2661.845, 'duration': 2.021}], 'summary': 'Updating query in original rad paper is efficient, requires high quality embedding model.', 'duration': 29.02, 'max_score': 2634.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2634846.jpg'}, {'end': 2749.647, 'src': 'embed', 'start': 2719.449, 'weight': 5, 'content': [{'end': 2720.87, 'text': 'Other questions? Sure.', 'start': 2719.449, 'duration': 1.421}, {'end': 2730.663, 'text': 'Are the documents in the training set the same as those in the template, or do you change the set of documents? Yeah, so they can be different.', 'start': 2722.679, 'duration': 7.984}, {'end': 2735.816, 'text': 'So in Atlas, Atlas basically tries everything.', 'start': 2732.914, 'duration': 2.902}, {'end': 2743.402, 'text': 'So they also try to see what happens if I train this on Wikipedia, but I swap in like a sort of common crawl index.', 'start': 2736.517, 'duration': 6.885}, {'end': 2749.647, 'text': "And I think so in Atlas, but also in retro domain finding, it's just the more the better.", 'start': 2744.163, 'duration': 5.484}], 'summary': 'Atlas tries various training sets, including wikipedia and common crawl, to improve domain finding.', 'duration': 30.198, 'max_score': 2719.449, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2719449.jpg'}, {'end': 2806.164, 'src': 'embed', 'start': 2779.151, 'weight': 4, 'content': [{'end': 2782.794, 'text': 'So it introduces a lot of these new architectural changes,', 'start': 2779.151, 'duration': 3.643}, {'end': 2787.518, 'text': 'like the sliding window attention to handle longer sequences at a smaller cost in the group.', 'start': 2782.794, 'duration': 4.724}, {'end': 2790.159, 'text': 'query attention for faster inference.', 'start': 2787.978, 'duration': 2.181}, {'end': 2800.962, 'text': "i'd like i'd like to like know your thoughts on designing a generator specifically for rag leveraging, for example, where mistral 7b currently is,", 'start': 2790.159, 'duration': 10.803}, {'end': 2806.164, 'text': 'because, for example, like the sliding window attention, i could see how that could be adapted to the rag case.', 'start': 2800.962, 'duration': 5.202}], 'summary': 'Introducing new architectural changes for faster inference and handling longer sequences at a smaller cost.', 'duration': 27.013, 'max_score': 2779.151, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2779151.jpg'}], 'start': 2221.475, 'title': 'Document encoder efficiency and rag system generator design', 'summary': 'Discusses the efficiency of updating document encoders in retrieval augmentation methods, proposing asynchronous update mechanisms. it also covers a comprehensive analysis of atlas and retriever, highlighting the importance of high-quality document representation, and discusses the rag system generator design, including training set variance, architectural changes, and the use of sliding window attention.', 'chapters': [{'end': 2426.122, 'start': 2221.475, 'title': 'Efficiency of updating document encoder', 'summary': 'Discusses the efficiency of updating the document encoder in retrieval augmentation methods, highlighting the challenges of re-encoding the entire internet after every batch update and proposing asynchronous update mechanisms as potential solutions.', 'duration': 204.647, 'highlights': ['Updating the document encoder is expensive due to the need to re-encode the entire internet after every batch update, which can be inefficient when dealing with trillions of tokens.', "The paper 'Realm' introduced a method involving asynchronous updates and sharding mechanisms to address the inefficiency of updating the document encoder in retrieval augmentation methods.", 'Retro and RAG focused on updating the query encoder instead of the document encoder to mitigate the high cost of updating the document encoder in retrieval augmentation methods.']}, {'end': 2718.888, 'start': 2426.262, 'title': 'Atlas and retriever: a comprehensive analysis', 'summary': 'Discusses the comprehensive analysis of atlas and retriever paper, comparing different versions, training methods, and system optimizations, indicating the superiority of certain methods and highlighting the importance of high-quality document representation for retrieval augmented models.', 'duration': 292.626, 'highlights': ['The comparison of different versions of retriever training methods indicates that the RePlug style loss and leave-one-out loss perform significantly better than others, demonstrating the effectiveness of optimizing retriever training methods.', "The experiment with different training data and tasks reveals that the retrieval system's performance is influenced by the compatibility of the training method with the language model, emphasizing the importance of aligning the retrieval method with the language model expectations.", 'The finding that updating the query, as in the original RAD paper, is sufficient in many cases for retriever updates highlights the efficiency of this method and the necessity of high-quality document representation for effective retrieval performance.']}, {'end': 2935.351, 'start': 2719.449, 'title': 'Rag system generator design', 'summary': 'Discusses the training set variance in atlas, the architectural changes in mistral 7b for faster inference, and the use of sliding window attention in the rag system generator design.', 'duration': 215.902, 'highlights': ['Mistral 7B introduces architectural changes for faster inference, such as sliding window attention and query attention.', 'Atlas tries different training sets, including Wikipedia and common crawl index, to improve predictions based on the index size.', "Discussion on adapting Mistral's sliding window attention to make it better for the RAG case by using a dynamic window instead of a fixed window."]}], 'duration': 713.876, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2221475.jpg', 'highlights': ['Asynchronous updates and sharding mechanisms address inefficiency of updating document encoder', 'Retro and RAG focus on updating query encoder to mitigate high cost of updating document encoder', 'RePlug style loss and leave-one-out loss perform significantly better in retriever training methods', 'Efficiency of updating query, as in the original RAD paper, is sufficient in many cases for retriever updates', 'Mistral 7B introduces architectural changes for faster inference, such as sliding window attention and query attention', 'Atlas tries different training sets, including Wikipedia and common crawl index, to improve predictions based on index size']}, {'end': 3565.352, 'segs': [{'end': 2978.818, 'src': 'embed', 'start': 2955.184, 'weight': 3, 'content': [{'end': 2967.053, 'text': "it's like not too crazy to say are there any architectural changes that we can introduce into these seven billion parameter models so that they could be better adapted to the rag case?", 'start': 2955.184, 'duration': 11.869}, {'end': 2969.875, 'text': 'yeah, so, uh, there there might be.', 'start': 2967.053, 'duration': 2.822}, {'end': 2970.896, 'text': 'yeah, i i think.', 'start': 2969.875, 'duration': 1.021}, {'end': 2976.18, 'text': "one one question is just how do you, how do you do the attention over things you've retrieved right, which i think is what you're?", 'start': 2970.896, 'duration': 5.284}, {'end': 2978.818, 'text': 'yeah, thanks.', 'start': 2977.998, 'duration': 0.82}], 'summary': 'Exploring architectural changes for better adaptation to the rag case in seven billion parameter models.', 'duration': 23.634, 'max_score': 2955.184, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2955184.jpg'}, {'end': 3123.914, 'src': 'embed', 'start': 3095.305, 'weight': 4, 'content': [{'end': 3103.181, 'text': 'Yeah, so you do want to update the retriever, but only part of the retriever is necessary to be updated for a lot of these cases.', 'start': 3095.305, 'duration': 7.876}, {'end': 3112.17, 'text': 'But so I think it, so these are very specific data sets, right? Natural questions, Wizard of Wikipedia and Fever.', 'start': 3104.807, 'duration': 7.363}, {'end': 3116.051, 'text': "So they're really very kind of knowledge intensive tasks.", 'start': 3112.25, 'duration': 3.801}, {'end': 3123.914, 'text': 'So in that case, if you already have a very good system, like DPR, that is specifically pre-trained for those tasks,', 'start': 3116.932, 'duration': 6.982}], 'summary': 'Updating only part of the retriever is necessary for knowledge-intensive tasks using specific datasets like natural questions, wizard of wikipedia, and fever.', 'duration': 28.609, 'max_score': 3095.305, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3095305.jpg'}, {'end': 3206.163, 'src': 'embed', 'start': 3139.525, 'weight': 5, 'content': [{'end': 3151.419, 'text': 'So I think that in this very large handset, we can actually decouple the original document and generate the original document.', 'start': 3139.525, 'duration': 11.894}, {'end': 3161.907, 'text': 'with the documents and those good models.', 'start': 3156.493, 'duration': 5.414}, {'end': 3167.759, 'text': 'Yeah, but so you need to learn how to kind of query into that index.', 'start': 3164.056, 'duration': 3.703}, {'end': 3173.983, 'text': "So if you don't do that, then you don't get really good performance.", 'start': 3168.359, 'duration': 5.624}, {'end': 3176.825, 'text': "So that's sort of like your closed book performance.", 'start': 3174.023, 'duration': 2.802}, {'end': 3183.229, 'text': "If you just have the language model and you're just like what does the parametric model on its own, without the retrieval?", 'start': 3176.945, 'duration': 6.284}, {'end': 3184.21, 'text': 'what does it actually know?', 'start': 3183.229, 'duration': 0.981}, {'end': 3186.952, 'text': 'As you can see, there are pretty big gaps there.', 'start': 3184.97, 'duration': 1.982}, {'end': 3195.343, 'text': 'Other questions? Otherwise, I will cover other questions.', 'start': 3192.679, 'duration': 2.664}, {'end': 3201.333, 'text': 'No? Hello? Yeah, go for it.', 'start': 3198.969, 'duration': 2.364}, {'end': 3203.821, 'text': 'A quick question.', 'start': 3201.88, 'duration': 1.941}, {'end': 3206.163, 'text': 'What about more hierarchical retrieval?', 'start': 3203.921, 'duration': 2.242}], 'summary': 'Decoupling the original document can improve performance in large handset with language models and retrieval.', 'duration': 66.638, 'max_score': 3139.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3139525.jpg'}, {'end': 3294.242, 'src': 'embed', 'start': 3268.788, 'weight': 0, 'content': [{'end': 3276.433, 'text': "So there's a trend where we want to have very long context language models so that basically you can take Harry Potter or something,", 'start': 3268.788, 'duration': 7.645}, {'end': 3284.498, 'text': "just put it into context and then ask a question like what is the name of Harry Potter's owl or something? And then it can just attend over the entire thing.", 'start': 3276.433, 'duration': 8.065}, {'end': 3290.402, 'text': 'So attending over all of Harry Potter to answer that one question is super inefficient.', 'start': 3285.599, 'duration': 4.803}, {'end': 3294.242, 'text': 'Right So most of Harry Potter has nothing to do with the owl.', 'start': 3291.041, 'duration': 3.201}], 'summary': "Efficiency is crucial for long-context language models, as in the case of harry potter's owl.", 'duration': 25.454, 'max_score': 3268.788, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3268788.jpg'}, {'end': 3402.258, 'src': 'embed', 'start': 3375.897, 'weight': 1, 'content': [{'end': 3381.661, 'text': "and I'm going to learn when I want to expand the compute budget on doing the retrieval.", 'start': 3375.897, 'duration': 5.764}, {'end': 3387.985, 'text': 'So, a nice paper where they have a stab at this is called FLAIR for active retrieval augmentation.', 'start': 3382.802, 'duration': 5.183}, {'end': 3394.951, 'text': 'where they basically have the language model decide when it should do a search and what it should do the search for.', 'start': 3388.505, 'duration': 6.446}, {'end': 3402.258, 'text': 'So I think this fits in a general trend that you can see in the field around agents.', 'start': 3396.673, 'duration': 5.585}], 'summary': 'Flair paper suggests language model for active retrieval augmentation.', 'duration': 26.361, 'max_score': 3375.897, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3375897.jpg'}, {'end': 3548.921, 'src': 'embed', 'start': 3498.59, 'weight': 2, 'content': [{'end': 3508.094, 'text': 'so this massive index of all the stuff on the internet, including some things that are maybe higher risk, you can still have them in your index.', 'start': 3498.59, 'duration': 9.504}, {'end': 3511.536, 'text': 'but your language model, uh, your retrieval augmented language model,', 'start': 3508.094, 'duration': 3.442}, {'end': 3515.598, 'text': 'i should say you know that that thing is safe because it was trained on data that is public domain.', 'start': 3511.536, 'duration': 4.062}, {'end': 3519.2, 'text': "So that's what they do in silo and they show that that works really well.", 'start': 3516.438, 'duration': 2.762}, {'end': 3527.106, 'text': "So that's one possible solution to a lot of the compliance and legal risk around language model deployments.", 'start': 3519.761, 'duration': 7.345}, {'end': 3537.333, 'text': "There's a great paper also from one of your colleagues around context getting lost in the middle.", 'start': 3529.768, 'duration': 7.565}, {'end': 3539.675, 'text': 'I think this is also a fascinating phenomenon.', 'start': 3537.433, 'duration': 2.242}, {'end': 3541.116, 'text': 'This is on a frozen reg system.', 'start': 3539.695, 'duration': 1.421}, {'end': 3548.921, 'text': 'Language models are very similar to humans and what things they pay attention to.', 'start': 3544.438, 'duration': 4.483}], 'summary': 'Using public domain data for language model training can mitigate legal and compliance risks in deployments.', 'duration': 50.331, 'max_score': 3498.59, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3498590.jpg'}], 'start': 2935.351, 'title': 'Transformer and language model retrieval', 'summary': 'Discusses potential architectural changes in transformer models with seven billion parameters for retrieval, including attention mechanisms and impact on knowledge-intensive tasks. it also explores efficient retrieval methods for language models, emphasizing inefficiency of long context models and trend towards agents learning when to retrieve.', 'chapters': [{'end': 3206.163, 'start': 2935.351, 'title': 'Transformer architectural changes for retrieval models', 'summary': 'Discusses the potential architectural changes in transformer models with seven billion parameters to better adapt to retrieval cases, including attention mechanisms, updating the retriever, and the impact on knowledge-intensive tasks.', 'duration': 270.812, 'highlights': ['The potential architectural changes in transformer models with seven billion parameters for better adaptation to retrieval cases, including attention mechanisms and updating the retriever.', 'The impact of specific data sets like Natural Questions, Wizard of Wikipedia, and Fever on the need to update the query encoder or document encoder.', 'The necessity of learning how to query into the index for achieving good performance in closed book scenarios.', 'The discussion on the potential need for hierarchical retrieval in transformer models.']}, {'end': 3565.352, 'start': 3206.503, 'title': 'Efficient retrieval methods for language models', 'summary': 'Discusses the efficiency of retrieval methods for language models, highlighting the inefficiency of long context models, the trend towards agents learning when to retrieve, and the use of safe training data for compliance and legal risk.', 'duration': 358.849, 'highlights': ['The inefficiency of long context language models is highlighted, with the RAG way being a more efficient solution, as attending over the entire context for a single question is super inefficient.', 'The trend towards agents learning when to retrieve and expand the compute budget on doing the retrieval is discussed, with the FLAIR paper presenting a method where the language model decides when and what to search for.', 'The use of safe training data, such as training a retrieval augmented language model on public domain data but allowing it to retrieve from a massive index during test time, is proposed as a solution to compliance and legal risks around language model deployments.', 'The phenomenon of context getting lost in the middle in frozen retrieval augmented language models is highlighted, as language models tend to pay attention to the first and last things retrieved but ignore the middle.']}], 'duration': 630.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg2935351.jpg', 'highlights': ['The inefficiency of long context language models is highlighted, with the RAG way being a more efficient solution, as attending over the entire context for a single question is super inefficient.', 'The trend towards agents learning when to retrieve and expand the compute budget on doing the retrieval is discussed, with the FLAIR paper presenting a method where the language model decides when and what to search for.', 'The use of safe training data, such as training a retrieval augmented language model on public domain data but allowing it to retrieve from a massive index during test time, is proposed as a solution to compliance and legal risks around language model deployments.', 'The potential architectural changes in transformer models with seven billion parameters for better adaptation to retrieval cases, including attention mechanisms and updating the retriever.', 'The impact of specific data sets like Natural Questions, Wizard of Wikipedia, and Fever on the need to update the query encoder or document encoder.', 'The necessity of learning how to query into the index for achieving good performance in closed book scenarios.', 'The discussion on the potential need for hierarchical retrieval in transformer models.', 'The phenomenon of context getting lost in the middle in frozen retrieval augmented language models is highlighted, as language models tend to pay attention to the first and last things retrieved but ignore the middle.']}, {'end': 3874.095, 'segs': [{'end': 3591.857, 'src': 'embed', 'start': 3566.633, 'weight': 0, 'content': [{'end': 3574.614, 'text': "so i i think that's a very interesting observation, which kind of shows that how brittle, uh, these these systems can be right.", 'start': 3566.633, 'duration': 7.981}, {'end': 3577.695, 'text': 'so if you have a frozen rack system, it can be very, very brittle.', 'start': 3574.614, 'duration': 3.081}, {'end': 3584.836, 'text': 'where, like the order of the retrieved context matters a lot in whether you get the right answer or not,', 'start': 3577.695, 'duration': 7.141}, {'end': 3591.857, 'text': "it doesn't work on creating this as a really funny problem in the sense of not putting some vector and supposed to like,", 'start': 3584.836, 'duration': 7.021}], 'summary': 'Frozen rack system can be brittle, order affects accuracy.', 'duration': 25.224, 'max_score': 3566.633, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3566633.jpg'}, {'end': 3669.438, 'src': 'embed', 'start': 3625.559, 'weight': 1, 'content': [{'end': 3633.984, 'text': 'So I think the replug solution is more elegant for solving that problem, because you actually do signal from the language model.', 'start': 3625.559, 'duration': 8.425}, {'end': 3636.445, 'text': "And if you just do reinforce, it's very high variance.", 'start': 3634.024, 'duration': 2.421}, {'end': 3641.528, 'text': "So it's going to be super finicky if you don't want to destroy your index.", 'start': 3636.485, 'duration': 5.043}, {'end': 3644.33, 'text': 'But people have tried it, yeah.', 'start': 3643.269, 'duration': 1.061}, {'end': 3664.452, 'text': "So There's some really nice work from OpenAI where they basically show and again we're thinking more and more about agents here where they show something very similar to the flare results from earlier with active retrieval.", 'start': 3644.35, 'duration': 20.102}, {'end': 3669.438, 'text': "that doesn't necessarily have to be some index that you only can read, just some web search.", 'start': 3664.452, 'duration': 4.986}], 'summary': "Replug solution is more elegant, reduces variance, openai's work shows similar results to flare with active retrieval.", 'duration': 43.879, 'max_score': 3625.559, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3625559.jpg'}, {'end': 3736.882, 'src': 'embed', 'start': 3691.144, 'weight': 4, 'content': [{'end': 3696.267, 'text': 'So, rather than just retrieval, augmenting language models, we can tool augment language models,', 'start': 3691.144, 'duration': 5.123}, {'end': 3699.89, 'text': 'and retrieval is just one of the many tools that language models have access to.', 'start': 3696.267, 'duration': 3.623}, {'end': 3704.313, 'text': 'We can have re-rankers and things on top of the outputs of these tools.', 'start': 3700.47, 'duration': 3.843}, {'end': 3711.678, 'text': 'And so one of the big questions I think is how do you actually get the system to learn stuff right?', 'start': 3705.454, 'duration': 6.224}, {'end': 3716.661, 'text': "So we're going to need RL if we want the system to really learn how to take these actions properly.", 'start': 3711.698, 'duration': 4.963}, {'end': 3728.353, 'text': "And so yeah, this has been taken to the extreme in this sort of self-reg architecture, where they have this retrieval step and it's active,", 'start': 3720.185, 'duration': 8.168}, {'end': 3733.519, 'text': 'and then you criticize it and then you basically do some natural language inference,', 'start': 3728.353, 'duration': 5.166}, {'end': 3736.882, 'text': 'and all of that just with one language model to answer the questions.', 'start': 3733.519, 'duration': 3.363}], 'summary': 'Augmenting language models with tools like retrieval, re-rankers, and rl for learning.', 'duration': 45.738, 'max_score': 3691.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3691144.jpg'}, {'end': 3783.648, 'src': 'embed', 'start': 3757.823, 'weight': 3, 'content': [{'end': 3765.104, 'text': 'um, but the instruction tuning has almost always only happened on the language model and not on the entire system.', 'start': 3757.823, 'duration': 7.281}, {'end': 3777.187, 'text': 'so i think one of the interesting uh things that people are looking at now with with things like rod and instruct retro is how can we instruction fine-tune an entire retrieval augmented system so all the way into the retrieval step?', 'start': 3765.104, 'duration': 12.083}, {'end': 3783.648, 'text': "can we generate data so that that also follows the instructions properly, which currently doesn't happen in any of these model architectures?", 'start': 3777.187, 'duration': 6.461}], 'summary': 'Exploring instruction fine-tuning for retrieval augmented systems to improve adherence to instructions.', 'duration': 25.825, 'max_score': 3757.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3757823.jpg'}], 'start': 3566.633, 'title': 'Language model retrieval and augmentation', 'summary': 'Explores the fragility of frozen rack systems, reinforcement learning in language models, tool augmentation, instruction tuning, and advancements in retrieval augmented systems. it covers topics such as lama index, langchain, child-parent recursive retrievers, and zero-shot large language model re-ranker.', 'chapters': [{'end': 3736.882, 'start': 3566.633, 'title': 'Retrieval and augmentation in language models', 'summary': 'Discusses the fragility of frozen rack systems, the use of reinforce for solving language model problems, the elegant replug solution, and the potential of tool augmentation in language models with a focus on reinforcement learning and self-reg architecture.', 'duration': 170.249, 'highlights': ['The replug solution is more elegant for solving the problem, providing signal from the language model and addressing the high variance issue of using reinforce.', "OpenAI's work demonstrates the potential of active retrieval without relying solely on a fixed index, expanding the capabilities of language models.", 'The discussion emphasizes the need for reinforcement learning to enable language models to properly learn and take actions, as exemplified in the self-reg architecture.', "The fragility of frozen rack systems is highlighted, emphasizing the significant impact of the order of retrieved context on the accuracy of the model's output."]}, {'end': 3874.095, 'start': 3738.875, 'title': 'Instruction tuning and advanced rag in retrieval augmented systems', 'summary': 'Discusses the importance of instruction tuning in retrieval augmented systems and the advancements in advanced rag, including frameworks like lama index and langchain, frozen rag capabilities, and innovative retrieval strategies such as child-parent recursive retrievers and zero-shot large language model re-ranker.', 'duration': 135.22, 'highlights': ['Instruction tuning plays a crucial role in making retrieval augmented systems work effectively, with a focus on fine-tuning the entire system rather than just the language model.', 'The developer community has made significant advancements in retrieval augmented systems, including frameworks like Lama Index and Langchain, as well as open source vector databases like Chroma and Weaviate, to simplify the implementation of RAG.', 'In the realm of frozen RAG, innovative retrieval strategies such as child-parent recursive retrievers, hybrid search with reciprocal rank fusion, zero-shot large language model re-ranker, and hypothetical document embeddings are contributing to the enhancement of retrieval augmented systems.']}], 'duration': 307.462, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3566633.jpg', 'highlights': ['The fragility of frozen rack systems is emphasized, impacting the accuracy of model output.', "OpenAI's work demonstrates the potential of active retrieval, expanding language model capabilities.", 'Replug solution provides signal from language model, addressing high variance issue.', 'Instruction tuning is crucial for effective retrieval augmented systems.', 'Advancements in retrieval augmented systems include frameworks like Lama Index and Langchain.', 'Reinforcement learning is needed for language models to properly learn and take actions.']}, {'end': 4758.414, 'segs': [{'end': 4043.179, 'src': 'embed', 'start': 3977.277, 'weight': 1, 'content': [{'end': 3983.201, 'text': 'but maybe this is a bit of a critique of, uh, maybe silicon valley investment strategies and things like that.', 'start': 3977.277, 'duration': 5.924}, {'end': 3989.125, 'text': 'but a lot of these um vector database companies are basically becoming database companies now.', 'start': 3983.201, 'duration': 5.924}, {'end': 3996.59, 'text': 'so they are adding all this sparse stuff because the the dense thing is not enough and as it turns out, there are a lot of pretty good,', 'start': 3989.125, 'duration': 7.465}, {'end': 4000.494, 'text': 'uh sparse databases out there already, like Postgres and things like that.', 'start': 3996.59, 'duration': 3.904}, {'end': 4004.037, 'text': "And they're also all adding vectors to their databases.", 'start': 4000.574, 'duration': 3.463}, {'end': 4008.542, 'text': "So I think that's all going to kind of coalesce into databases.", 'start': 4004.077, 'duration': 4.465}, {'end': 4016.961, 'text': 'So I think there are some interesting things to look at for kind of the data.', 'start': 4012.338, 'duration': 4.623}, {'end': 4023.785, 'text': 'So through this instruction problem, can we generate much better data for training rag systems synthetically?', 'start': 4017.001, 'duration': 6.784}, {'end': 4029.708, 'text': "And then I think there's this massive open question around how we actually measure whether the rag system is any good.", 'start': 4024.785, 'duration': 4.923}, {'end': 4037.693, 'text': "So right now we just look at downstream performance, which is sort of okay, but if you mess up the retrieval, it's very hard to measure.", 'start': 4030.009, 'duration': 7.684}, {'end': 4043.179, 'text': 'But how to measure whether your retrieval is right is also very difficult.', 'start': 4039.014, 'duration': 4.165}], 'summary': 'Silicon valley investments shifting to sparse databases with added vectors, raising questions on data quality and evaluation.', 'duration': 65.902, 'max_score': 3977.277, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3977277.jpg'}, {'end': 4152.764, 'src': 'embed', 'start': 4115.3, 'weight': 0, 'content': [{'end': 4121.942, 'text': "So I think that's really like, if you look at the trend in the field, multimodality with GPT-4 or V and things like that is really a hot topic.", 'start': 4115.3, 'duration': 6.642}, {'end': 4124.322, 'text': 'So everything is kind of going in that direction.', 'start': 4121.982, 'duration': 2.34}, {'end': 4126.603, 'text': "So it's an interesting thing to think about.", 'start': 4124.962, 'duration': 1.641}, {'end': 4137.191, 'text': 'um, so overall i think, uh, it would be nice if everybody sort of moves away from rag 1.0, the frozen frankenstein rag,', 'start': 4128.243, 'duration': 8.948}, {'end': 4141.194, 'text': 'and moves towards this much more kind of optimized version, rag 2.0.', 'start': 4137.191, 'duration': 4.003}, {'end': 4143.196, 'text': "so it's really about systems over models.", 'start': 4141.194, 'duration': 2.002}, {'end': 4147.158, 'text': "right, it's not just your language model and your retriever, and they're kind of separate.", 'start': 4143.196, 'duration': 3.962}, {'end': 4152.764, 'text': "it's about thinking from the from a systems perspective, about the entire thing and the problem you're trying to solve.", 'start': 4147.158, 'duration': 5.606}], 'summary': 'Multimodality with gpt-4/v is a hot topic, advocating for moving from rag 1.0 to optimized version rag 2.0 for systems over models.', 'duration': 37.464, 'max_score': 4115.3, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg4115300.jpg'}, {'end': 4229.845, 'src': 'embed', 'start': 4203.364, 'weight': 2, 'content': [{'end': 4208.086, 'text': 'So what you want to do is make it much more efficient and have the right cost quality trade-off.', 'start': 4203.364, 'duration': 4.722}, {'end': 4211.747, 'text': 'And the easiest way I can think of is to do it through retrieval augmentation.', 'start': 4208.246, 'duration': 3.501}, {'end': 4213.648, 'text': "But obviously I'm very biased.", 'start': 4212.127, 'duration': 1.521}, {'end': 4218.221, 'text': 'So yeah, that was all I had actually.', 'start': 4215.8, 'duration': 2.421}, {'end': 4224.743, 'text': "So, if you're interested in this, I'm at Stanford, so I can work with you on research projects on these topics.", 'start': 4218.561, 'duration': 6.182}, {'end': 4228.344, 'text': 'or, if you want, you can also join Contextual, because we work on this stuff every day.', 'start': 4224.743, 'duration': 3.601}, {'end': 4229.845, 'text': 'Thank you.', 'start': 4229.545, 'duration': 0.3}], 'summary': 'Improve efficiency and cost quality trade-off through retrieval augmentation, offering research collaboration or employment.', 'duration': 26.481, 'max_score': 4203.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg4203364.jpg'}, {'end': 4587.51, 'src': 'embed', 'start': 4556.291, 'weight': 4, 'content': [{'end': 4560.351, 'text': 'So a lot of people are conflating hallucination with correctness or incorrectness.', 'start': 4556.291, 'duration': 4.06}, {'end': 4562.372, 'text': "So they're like, oh, the model made a mistake.", 'start': 4560.711, 'duration': 1.661}, {'end': 4563.072, 'text': 'It hallucinated.', 'start': 4562.452, 'duration': 0.62}, {'end': 4564.592, 'text': "It's like, no, it made a mistake.", 'start': 4563.112, 'duration': 1.48}, {'end': 4566.853, 'text': "It's different from hallucination.", 'start': 4565.552, 'duration': 1.301}, {'end': 4568.733, 'text': 'Hallucination, I think, is very specific.', 'start': 4566.873, 'duration': 1.86}, {'end': 4573.614, 'text': 'I retrieved something, so I have some sort of counterfactual ground truth.', 'start': 4569.373, 'duration': 4.241}, {'end': 4577.555, 'text': "And what I'm saying does not correspond to that ground truth.", 'start': 4573.694, 'duration': 3.861}, {'end': 4587.51, 'text': "And so, yeah, I think there's a bunch of folks at Stanford also working on better measurements of hallucination and definitions and things like that.", 'start': 4578.826, 'duration': 8.684}], 'summary': 'Hallucination in ai models is different from making mistakes; it needs better measurements and definitions.', 'duration': 31.219, 'max_score': 4556.291, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg4556291.jpg'}], 'start': 3875.917, 'title': 'Future of rag and multimodality', 'summary': 'Discusses the future of rag systems, open questions about pre-training, scaling laws, vector databases, system performance measurement, and the shift towards multimodality with gpt-4 or v, emphasizing the shift towards rag 2.0 optimized for systems over models.', 'chapters': [{'end': 4184.796, 'start': 3875.917, 'title': 'Future of rag and multimodality', 'summary': 'Discusses the future of rag systems, including open questions about pre-training, scaling laws, vector databases, measurement of system performance, and the shift towards multimodality with gpt-4 or v. it also emphasizes the shift towards rag 2.0, optimized for systems over models.', 'duration': 308.879, 'highlights': ['The shift towards RAG 2.0, optimized for systems over models, is emphasized, indicating the direction of progress in deep learning (Relevance: 5)', 'The discussion on the future of RAG systems and the open questions around pre-training, scaling laws, and vector databases provides insights into the potential impact of student research and development in this field (Relevance: 4)', 'The exploration of multimodality with GPT-4 or V and the trend in this direction is highlighted, indicating an important and evolving area in the field (Relevance: 3)', 'The challenges in measuring system performance and the open question of how to determine the quality of retrieval systems are discussed, presenting an important problem to address in this domain (Relevance: 2)', 'The critique of current investment strategies in vector database companies and the potential coalescence of databases into a single system sheds light on the evolving landscape of database technologies (Relevance: 1)']}, {'end': 4758.414, 'start': 4186.777, 'title': 'Efficient language models and retrieval augmentation', 'summary': 'Discusses the need for efficient language models and retrieval augmentation to achieve a trade-off between cost and quality, with a focus on tuning transformer architecture, exploring light convolutions, and the use of retrieval hardware. it also addresses the challenges of hallucination in language models and the need for better measurements and definitions.', 'duration': 571.637, 'highlights': ['Efficient language models and retrieval augmentation are crucial for achieving a trade-off between cost and quality, particularly through tuning transformer architecture, exploring light convolutions, and the use of retrieval hardware.', 'Challenges of hallucination in language models are addressed, emphasizing the need for better measurements and definitions to distinguish it from correctness or mistakes.', 'The discussion also delves into the potential of adapting retrieval augmented systems through fine-tuning and the role of different indices and definitions of ground truth in language models.', 'The potential of tuning transformer architecture, exploring light convolutions, and the use of retrieval hardware is emphasized for enhancing the efficiency and cost-effectiveness of language models.']}], 'duration': 882.497, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mE7IDf2SmJg/pics/mE7IDf2SmJg3875917.jpg', 'highlights': ['The shift towards RAG 2.0, optimized for systems over models, is emphasized, indicating the direction of progress in deep learning (Relevance: 5)', 'The discussion on the future of RAG systems and the open questions around pre-training, scaling laws, and vector databases provides insights into the potential impact of student research and development in this field (Relevance: 4)', 'Efficient language models and retrieval augmentation are crucial for achieving a trade-off between cost and quality, particularly through tuning transformer architecture, exploring light convolutions, and the use of retrieval hardware (Relevance: 3)', 'The exploration of multimodality with GPT-4 or V and the trend in this direction is highlighted, indicating an important and evolving area in the field (Relevance: 3)', 'Challenges of hallucination in language models are addressed, emphasizing the need for better measurements and definitions to distinguish it from correctness or mistakes (Relevance: 2)', 'The challenges in measuring system performance and the open question of how to determine the quality of retrieval systems are discussed, presenting an important problem to address in this domain (Relevance: 2)', 'The critique of current investment strategies in vector database companies and the potential coalescence of databases into a single system sheds light on the evolving landscape of database technologies (Relevance: 1)']}], 'highlights': ['The shift towards RAG 2.0, optimized for systems over models, is emphasized, indicating the direction of progress in deep learning', 'The discussion on the future of RAG systems and the open questions around pre-training, scaling laws, and vector databases provides insights into the potential impact of student research and development in this field', 'Efficient language models and retrieval augmentation are crucial for achieving a trade-off between cost and quality, particularly through tuning transformer architecture, exploring light convolutions, and the use of retrieval hardware', 'The exploration of multimodality with GPT-4 or V and the trend in this direction is highlighted, indicating an important and evolving area in the field', 'The challenges of hallucination in language models are addressed, emphasizing the need for better measurements and definitions to distinguish it from correctness or mistakes', 'The challenges in measuring system performance and the open question of how to determine the quality of retrieval systems are discussed, presenting an important problem to address in this domain', 'The inefficiency of long context language models is highlighted, with the RAG way being a more efficient solution, as attending over the entire context for a single question is super inefficient', 'The trend towards agents learning when to retrieve and expand the compute budget on doing the retrieval is discussed, with the FLAIR paper presenting a method where the language model decides when and what to search for', 'The use of safe training data, such as training a retrieval augmented language model on public domain data but allowing it to retrieve from a massive index during test time, is proposed as a solution to compliance and legal risks around language model deployments', 'The potential architectural changes in transformer models with seven billion parameters for better adaptation to retrieval cases, including attention mechanisms and updating the retriever', 'The impact of specific data sets like Natural Questions, Wizard of Wikipedia, and Fever on the need to update the query encoder or document encoder', 'The necessity of learning how to query into the index for achieving good performance in closed book scenarios', 'The discussion on the potential need for hierarchical retrieval in transformer models', 'The phenomenon of context getting lost in the middle in frozen retrieval augmented language models is highlighted, as language models tend to pay attention to the first and last things retrieved but ignore the middle', "OpenAI's work demonstrates the potential of active retrieval, expanding language model capabilities", 'Replug solution provides signal from language model, addressing high variance issue', 'Instruction tuning is crucial for effective retrieval augmented systems', 'Advancements in retrieval augmented systems include frameworks like Lama Index and Langchain', 'Reinforcement learning is needed for language models to properly learn and take actions', 'The recent work from NVIDIA called Retro++ combines the retro architecture with elements of RAG, showing promising results but highlighting the challenge of reproducing and validating the effectiveness of these models', 'The Retro architecture, when exploited by the paper called Retro out of DeepMind, resulted in a 25 times smaller retrieval augmented language model trained from scratch, outperforming a 25 times bigger language model on the same data in terms of perplexity', 'The fusion and decoder approach scales to a much higher number of passages, leading to corresponding improvements in the scores', 'The FACE library powers modern vector databases for efficient retrieval techniques', 'The efficiency of dot products in vector databases and the trend towards hybrid search are emphasized as key points in the chapter', 'The Siamese network involves using two different BERT models and computing a dot product to obtain a single score, while late interaction and Dragon introduce more complex scoring methods for retrieval', 'Splayed addresses the issue of handling synonyms in sparse models by combining sparse and dense representations for more efficient search, while Dragon is highlighted as a recommended dense retriever with progressive data augmentation', "The use of dense retrievers for document retrieval and the importance of precise word overlap for things like brand names, as exemplified by the case of the brand 'Apple' and avoiding retrieval of unrelated information about 'pears'", 'The concept of contextualizing the retriever for the generator, as demonstrated by the approach of using a dense retriever for document retrieval and a re-ranker to backprop into a BERT model, resulting in a more optimized system for contextualized use', 'The discussion on methods for optimizing both the retriever and the generator, leading to a more cohesive and contextualized architecture where everything works together, as exemplified by the RAG (Retrieval Augmented Generation) model and its approach to updating both the retriever and the generator', 'The developer community is currently engaging in hybrid search, combining results from different retrieval techniques to achieve improved rankings', 'A paper from 2021 discusses the reduction of hallucination through retrieval augmentation, providing a benchmark for evaluating the impact on closed book question answering', 'The advantages of dense retrieval over sparse retrieval include semantic similarity and the ability to find relevant documents based on synonyms, as demonstrated by models like BERT, ORCA, and DPR', 'During training time, important considerations include deciding which components of the system to update, how to update them, and whether to pre-train the system from scratch or initialize it with separately trained components', 'The use of dense retrieval models like BERT, ORCA, and DPR as latent variables in the system, and the importance of pre-training the retriever on relevant information for effective system training', 'During test time, there are opportunities to manipulate the system, such as giving different indices or altering the sampling process', 'TF-IDF is a sparse retrieval method that uses term frequency and inverse document frequency to score document-query overlap, with parameters like BM25 for better scoring, and is widely used in systems like dr qa for open domain question answering', 'The concept of frozen RAG represents a system without any training, functioning solely during test time', 'The emergence of powerful language models through parameterizing massive neural nets', 'The challenges around productionizing language models, particularly related to issues like hallucination and unknown attributions', 'The importance of fixing the user interface for language models, as demonstrated by ChatGPT', 'The need for language models to remain up-to-date and never go stale, and the challenges of model editing and customization for different use cases and data', 'The use of RAG to contextualize language models by incorporating external context for better performance and grounding', 'The misconception about the invention of language models is addressed, elucidating that it is not a recent idea and was introduced several decades ago', 'The distinction between parametric and semi-parametric approaches in language model systems, and the benefits of RAG in allowing customization and updates to the index for better performance and grounding', 'Dawa is the CEO of Contextual AI, an enterprise LLM company, and an adjunct professor in symbolic systems at Stanford, with previous roles at ClickingBase and Facebook AI Research', "He holds a PhD and master's from the University of Cambridge, a master's in logic from the University of Amsterdam, and has studied philosophy and cognitive AI in undergrad", 'His work centers around machine learning and NLP, emphasizing the development of better models for language understanding and generation, as well as improved tools for evaluation']}