title
2. Building Systems with the ChatGPT API | Andrew Ng | DeepLearning.ai - Full Course

description
The course comes from [https://learn.deeplearning.ai/chatgpt-building-system/lesson/1/introduction](https://learn.deeplearning.ai/chatgpt-building-system/lesson/1/introduction) The course will be led by Andrew Ng This video course explains how to build a system using the ChatGPT API. Unlike previous lessons on how to prompt ChatGPT, the system requires more of a single prompt or a single call to an LLM (large language model). The course takes the example of building an end-to-end customer service assistance system that works by making multiple calls to a language model, using different instructions based on the output of previous calls, and sometimes even looking up information from an external source. The course emphasizes that applications often require multiple internal steps, which are not visible to the end user. In addition, in the long term, building complex systems using LLMs often requires continuous system improvement. The authors also share the process for developing LLM applications and best practices for evaluating and improving systems over time. Get free course notes: https://t.me/NoteForYoutubeCourse

detail
{'title': '2. Building Systems with the ChatGPT API | Andrew Ng | DeepLearning.ai - Full Course', 'heatmap': [], 'summary': 'Explores building an end-to-end customer service system using chatgpt api, training large language models, including instruction-tuned llm, optimizing api usage, and discussing ai content moderation and reasoning, user message processing, model reasoning, product categorization, and system performance evaluation, achieving 90% correctness in the development set and qualitative assessment scores of a and d.', 'chapters': [{'end': 364.574, 'segs': [{'end': 150.774, 'src': 'embed', 'start': 122.214, 'weight': 0, 'content': [{'end': 128.638, 'text': 'And from the deeplearning.ai team, thank you also to Jeff Ludwig, Eddie Hsu, and Tommy Nelson.', 'start': 122.214, 'duration': 6.424}, {'end': 133.361, 'text': "Through this short course, we hope you'll come away confident in your abilities to build a complex,", 'start': 129.377, 'duration': 3.984}, {'end': 137.523, 'text': 'multi-step application and also be set up to maintain and keep on improving it.', 'start': 133.361, 'duration': 4.162}, {'end': 138.824, 'text': "Let's dive in.", 'start': 138.223, 'duration': 0.601}, {'end': 150.774, 'text': "In this first video, I'd like to share with you an overview of how LLMs large language models work.", 'start': 145.273, 'duration': 5.501}], 'summary': 'Learn to build complex applications confidently with llms in this short course.', 'duration': 28.56, 'max_score': 122.214, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE122214.jpg'}, {'end': 285.653, 'src': 'embed', 'start': 260.632, 'weight': 2, 'content': [{'end': 267.162, 'text': 'Specifically, a large language model can be built by using supervised learning to repeatedly predict the next word.', 'start': 260.632, 'duration': 6.53}, {'end': 276.348, 'text': "Let's say that, in your training sets of a lot of text data, you have the sentence, my favorite food is a bagel with cream cheese and lox.", 'start': 267.963, 'duration': 8.385}, {'end': 285.653, 'text': 'Then this sentence is turned into a sequence of training examples where, given a sentence fragment, my favorite food is A.', 'start': 277.308, 'duration': 8.345}], 'summary': 'Large language model built using supervised learning to predict next word.', 'duration': 25.021, 'max_score': 260.632, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE260632.jpg'}, {'end': 331.443, 'src': 'embed', 'start': 305.513, 'weight': 4, 'content': [{'end': 317.737, 'text': 'you can then create a massive training set where you can start off with part of a sentence or part of a piece of text and repeatedly ask the language model to learn to predict what is the next word.', 'start': 305.513, 'duration': 12.224}, {'end': 323.959, 'text': 'So today there are broadly two, major types of large language models.', 'start': 318.357, 'duration': 5.602}, {'end': 326.26, 'text': 'The first is a base LLM.', 'start': 324.759, 'duration': 1.501}, {'end': 331.443, 'text': 'And the second, which is what is increasingly used, is the instruction-tuned LLM.', 'start': 326.9, 'duration': 4.543}], 'summary': 'Training set creation for language models using base and instruction-tuned llms.', 'duration': 25.93, 'max_score': 305.513, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE305513.jpg'}], 'start': 0.129, 'title': 'Building and training llms', 'summary': 'Explores building an end-to-end customer service system using chatgpt api, including chaining multiple calls to a language model. it delves into supervised learning for training large language models, creating massive training sets from hundreds of billions of words, and discusses base llm and instruction-tuned llm.', 'chapters': [{'end': 243.599, 'start': 0.129, 'title': 'Building with chatgpt api', 'summary': 'Discusses best practices for building a complex application using an llm, focusing on an end-to-end customer service assistance system that chains multiple calls to a language model and the process of developing and improving an llm-based application, with gratitude expressed to the contributors. it also provides an overview of how llms work, including their training process and the use of supervised learning for training.', 'duration': 243.47, 'highlights': ['The chapter discusses best practices for building a complex application using an LLM, focusing on an end-to-end customer service assistance system that chains multiple calls to a language model and the process of developing and improving an LLM-based application, with gratitude expressed to the contributors. The course shares best practices for building a complex application using an LLM, with a focus on an end-to-end customer service assistance system that chains multiple calls to a language model and emphasizes the process of developing and improving an LLM-based application, expressing gratitude to the contributors.', 'It provides an overview of how LLMs work, including their training process and the use of supervised learning for training. The chapter provides an overview of how LLMs work, covering their training process and the use of supervised learning for training, using the example of classifying the sentiment of restaurant reviews.']}, {'end': 364.574, 'start': 243.599, 'title': 'Supervised learning for large language models', 'summary': 'Discusses how supervised learning is used to train large language models, creating massive training sets from hundreds of billions of words, and the two major types of large language models: base llm and instruction-tuned llm.', 'duration': 120.975, 'highlights': ['Supervised learning is a core building block for training large language models by using it to repeatedly predict the next word. Supervised learning is crucial for training large language models, allowing the model to predict the next word by repeatedly using training examples, such as predicting the next word in a sentence fragment.', 'Creating massive training sets from hundreds of billions of words to train large language models. The process involves creating a massive training set by repeatedly asking the language model to learn to predict the next word from part of a sentence or piece of text, using hundreds of billions of words.', 'Two major types of large language models: base LLM and instruction-tuned LLM. The two main types are base LLM, which predicts the next word based on text training data, and instruction-tuned LLM, which is increasingly used for training large language models.']}], 'duration': 364.445, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE129.jpg', 'highlights': ['The chapter discusses best practices for building a complex application using an LLM, focusing on an end-to-end customer service assistance system that chains multiple calls to a language model and the process of developing and improving an LLM-based application, with gratitude expressed to the contributors.', 'It provides an overview of how LLMs work, including their training process and the use of supervised learning for training.', 'Supervised learning is a core building block for training large language models by using it to repeatedly predict the next word.', 'Creating massive training sets from hundreds of billions of words to train large language models.', 'Two major types of large language models: base LLM and instruction-tuned LLM.']}, {'end': 965.302, 'segs': [{'end': 488.622, 'src': 'embed', 'start': 391.627, 'weight': 0, 'content': [{'end': 396.331, 'text': 'You first train a base LLM on a lot of data, so hundreds of billions of words, maybe even more.', 'start': 391.627, 'duration': 4.704}, {'end': 401.875, 'text': 'And this is a process that can take months on a large supercomputing system.', 'start': 396.811, 'duration': 5.064}, {'end': 405.217, 'text': "After you've trained the base LOM.", 'start': 403.016, 'duration': 2.201}, {'end': 415.482, 'text': 'you would then further train the model by fine-tuning it on a smaller set of examples where the output follows an input instruction.', 'start': 405.217, 'duration': 10.265}, {'end': 425.067, 'text': 'And so, for example, you may have contractors help you write a lot of examples of an instruction and then a good response to an instruction.', 'start': 415.722, 'duration': 9.345}, {'end': 429.429, 'text': 'And that creates a training set to carry out this additional fine tuning.', 'start': 425.587, 'duration': 3.842}, {'end': 433.531, 'text': "So that learns to predict what is the next word if it's trying to follow an instruction.", 'start': 429.489, 'duration': 4.042}, {'end': 438.955, 'text': "After that, to improve the quality of the LLM's output.", 'start': 434.873, 'duration': 4.082}, {'end': 448.479, 'text': 'a common process now is to obtain human ratings of the quality of many different LLM outputs, on criteria such as whether the output is helpful,', 'start': 438.955, 'duration': 9.524}, {'end': 449.599, 'text': 'honest and harmless.', 'start': 448.479, 'duration': 1.12}, {'end': 457.823, 'text': 'And you can then further tune the LLM to increase the probability of us generating the more highly rated outputs.', 'start': 450.48, 'duration': 7.343}, {'end': 462.925, 'text': 'And the most common technique to do this is ROHF, which stands for reinforcement learning from human feedback.', 'start': 457.923, 'duration': 5.002}, {'end': 467.549, 'text': 'And whereas training the base LLM can take months,', 'start': 464.525, 'duration': 3.024}, {'end': 479.484, 'text': 'the process of going from the base LLM to the instruction to an LLM can be done in maybe days on much more- on a much more modest size datasets and much more modest size computational resources.', 'start': 467.549, 'duration': 11.935}, {'end': 482.7, 'text': 'So this is how you would use an LLM.', 'start': 480.259, 'duration': 2.441}, {'end': 484.781, 'text': "I'm gonna import a few libraries.", 'start': 482.72, 'duration': 2.061}, {'end': 488.622, 'text': "I'm going to load my OpenAI key here.", 'start': 485.581, 'duration': 3.041}], 'summary': 'Train base llm on hundreds of billions of words, fine-tune with human feedback, and improve output quality with reinforcement learning.', 'duration': 96.995, 'max_score': 391.627, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE391627.jpg'}, {'end': 775.285, 'src': 'embed', 'start': 699.909, 'weight': 5, 'content': [{'end': 704.413, 'text': 'this nifty trick helps it to better see the individual letters of the words.', 'start': 699.909, 'duration': 4.504}, {'end': 712.92, 'text': 'For the English language, one token roughly on average corresponds to about four characters or about three-quarters of a word.', 'start': 706.074, 'duration': 6.846}, {'end': 722.167, 'text': 'And so different large language models will often have different limits on the number of input plus output tokens it can accept.', 'start': 714.042, 'duration': 8.125}, {'end': 726.791, 'text': 'The input is often called the context and the output is often called the completion.', 'start': 722.368, 'duration': 4.423}, {'end': 737.237, 'text': 'And the model GPT 3.5 Turbo for example, the most commonly used chat GPT model, has a limit of roughly 4, 000 tokens in the input plus output.', 'start': 727.591, 'duration': 9.646}, {'end': 743.722, 'text': "So if you try to feed it an input context that's much longer than this, it will actually throw an exception and generate an error.", 'start': 738.298, 'duration': 5.424}, {'end': 757.426, 'text': 'Next, I want to share with you another powerful way to use an LLM API, which involves specifying separate system user and assistant messages.', 'start': 744.952, 'duration': 12.474}, {'end': 764.794, 'text': "Let me show you an example, then we can explain in more detail what it's actually doing.", 'start': 759.148, 'duration': 5.646}, {'end': 769.702, 'text': "Here's a new helper function called getCompletionFromMessages.", 'start': 766.38, 'duration': 3.322}, {'end': 775.285, 'text': 'And when we prompt this LLM, we are going to give it multiple messages.', 'start': 770.462, 'duration': 4.823}], 'summary': 'Large language models have input plus output token limits, e.g., gpt 3.5 turbo has a limit of roughly 4,000 tokens.', 'duration': 75.376, 'max_score': 699.909, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE699909.jpg'}], 'start': 366.294, 'title': 'Training and tokenization of llms', 'summary': 'Covers the training process for an instruction-tuned llm, involving initial training on large datasets, fine-tuning on specific examples, obtaining human ratings for output quality, and the use of reinforcement learning. additionally, it explores the tokenization process of large language models, highlighting challenges, techniques, and limitations. it also discusses the application of llm api with examples.', 'chapters': [{'end': 518.438, 'start': 366.294, 'title': 'Training an instruction-tuned llm', 'summary': 'Explains the process of training an instruction-tuned llm, including the initial training on large datasets, fine-tuning on specific examples, obtaining human ratings for output quality, and the use of reinforcement learning, which reduces the time required for the process and the computational resources needed.', 'duration': 152.144, 'highlights': ['The process of going from the base LLM to the instruction-tuned LLM can be done in maybe days on much more modest size datasets and much more modest size computational resources. Transition from base LLM to instruction-tuned LLM is significantly faster and requires fewer computational resources.', 'The base LLM is initially trained on hundreds of billions of words, a process that can take months on a large supercomputing system. The base LLM undergoes extensive training on a vast amount of data over a prolonged period, potentially taking months.', "Obtaining human ratings of the quality of many different LLM outputs is a common process to improve the model's output quality. Human ratings are utilized to enhance the quality of LLM outputs.", 'Fine-tuning the model on a smaller set of examples where the output follows an input instruction is a crucial step in the process. The model is further trained by fine-tuning it on specific examples to align output with input instructions.', 'Reinforcement learning from human feedback (ROHF) is a common technique used to increase the probability of generating highly rated outputs. ROHF is employed to enhance the likelihood of generating top-rated outputs through reinforcement learning from human feedback.']}, {'end': 965.302, 'start': 520.14, 'title': 'Large language models: tokenization and applications', 'summary': 'Explains the tokenization process of large language models, highlighting the challenges faced, the tokenization technique, and the limitations of gpt 3.5 turbo with a token limit of roughly 4,000. it also explores a powerful way to use llm api by specifying separate system user and assistant messages and provides examples of its application.', 'duration': 445.162, 'highlights': ['The tokenization process of large language models is explained, highlighting how sequences of characters are grouped into tokens, with one token corresponding to about four characters or three-quarters of a word.', 'GPT 3.5 Turbo, the most commonly used chat GPT model, has a limit of roughly 4,000 tokens in the input plus output, exceeding which will generate an error.', 'A powerful way to use LLM API involves specifying separate system user and assistant messages, allowing for specific instructions and tone setting, as demonstrated by examples like prompting a Dr. Seuss style poem or setting the response length to one sentence.']}], 'duration': 599.008, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE366294.jpg', 'highlights': ['Transition from base LLM to instruction-tuned LLM is significantly faster and requires fewer computational resources.', 'The base LLM undergoes extensive training on a vast amount of data over a prolonged period, potentially taking months.', 'Human ratings are utilized to enhance the quality of LLM outputs.', 'The model is further trained by fine-tuning it on specific examples to align output with input instructions.', 'ROHF is employed to enhance the likelihood of generating top-rated outputs through reinforcement learning from human feedback.', 'Sequences of characters are grouped into tokens, with one token corresponding to about four characters or three-quarters of a word.', 'GPT 3.5 Turbo has a limit of roughly 4,000 tokens in the input plus output, exceeding which will generate an error.', 'A powerful way to use LLM API involves specifying separate system user and assistant messages, allowing for specific instructions and tone setting.']}, {'end': 1549.123, 'segs': [{'end': 1055.483, 'src': 'embed', 'start': 1023.455, 'weight': 2, 'content': [{'end': 1026.607, 'text': '000 or so token limits of chat GPT,', 'start': 1023.455, 'duration': 3.152}, {'end': 1033.972, 'text': "in which case you could double-check how many tokens it was and truncate it to make sure you're staying within the input token limits of the large language model.", 'start': 1026.607, 'duration': 7.365}, {'end': 1040.346, 'text': 'Now, I want to share with you one more tip for how to use a large language model.', 'start': 1034.98, 'duration': 5.366}, {'end': 1048.435, 'text': "Calling the OpenAI API requires using an API key that's tied to either a free or a paid account.", 'start': 1041.207, 'duration': 7.228}, {'end': 1055.483, 'text': 'And so, many developers will write the API key in plain text like this into their Jupyter notebook.', 'start': 1049.056, 'duration': 6.427}], 'summary': 'Use openai api with token limits, and secure api key in jupyter notebook.', 'duration': 32.028, 'max_score': 1023.455, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE1023455.jpg'}, {'end': 1130.036, 'src': 'embed', 'start': 1100.174, 'weight': 1, 'content': [{'end': 1112.269, 'text': 'env that contains my API key and this loads it into the operating systems, environmental variable and then os.getEnv open API key,', 'start': 1100.174, 'duration': 12.095}, {'end': 1114.05, 'text': 'stores it into this variable.', 'start': 1112.269, 'duration': 1.781}, {'end': 1122.993, 'text': "And in this whole process, I don't ever have to enter the API key in plain text and unencrypted plain text into my Jupyter notebook.", 'start': 1114.69, 'duration': 8.303}, {'end': 1128.915, 'text': 'So this is a relatively more secure and a better way to access the API key.', 'start': 1123.593, 'duration': 5.322}, {'end': 1130.036, 'text': 'And in fact,', 'start': 1129.415, 'duration': 0.621}], 'summary': 'Api key securely loaded into environment variable for jupyter notebook access.', 'duration': 29.862, 'max_score': 1100.174, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE1100174.jpg'}, {'end': 1265.416, 'src': 'embed', 'start': 1230.579, 'weight': 0, 'content': [{'end': 1237.465, 'text': 'So there are applications that used to take me maybe six months or a year to build, that you can now build in minutes or hours,', 'start': 1230.579, 'duration': 6.886}, {'end': 1244.731, 'text': 'maybe very small numbers of days, using prompting, and this is revolutionizing what AI applications can be built quickly.', 'start': 1237.465, 'duration': 7.266}, {'end': 1246.759, 'text': 'One important caveat.', 'start': 1245.698, 'duration': 1.061}, {'end': 1255.987, 'text': 'this applies to many unstructured data applications, including specifically text applications and maybe increasingly vision applications,', 'start': 1246.759, 'duration': 9.228}, {'end': 1260.611, 'text': "although the vision technology is much less mature right now, but it's kind of getting there.", 'start': 1255.987, 'duration': 4.624}, {'end': 1265.416, 'text': "This recipe doesn't really work for structured data applications,", 'start': 1260.631, 'duration': 4.785}], 'summary': 'Ai applications can now be built in minutes or hours, revolutionizing unstructured data applications, including text and vision.', 'duration': 34.837, 'max_score': 1230.579, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE1230579.jpg'}], 'start': 965.302, 'title': 'Api optimization and prompting-based workflow', 'summary': 'Discusses optimizing api usage, securing api keys, and analyzing token limits, emphasizing secure api key storage. it also highlights the reduction in time from months to minutes using prompting for unstructured data ai applications, along with its application, limitations, and workflow impact.', 'chapters': [{'end': 1149.867, 'start': 965.302, 'title': 'Optimizing api usage and securing api keys', 'summary': 'Discusses analyzing token usage in openai api calls, securing api keys, and the significance of token limits, emphasizing the importance of secure api key storage and the impact of prompting on ai application development.', 'duration': 184.565, 'highlights': ['The output used 92 tokens, comprising 55 tokens from the response and 37 tokens from the prompt input. 3', 'The importance of checking token usage when exceeding the 4000 token limits of chat GPT is highlighted. 2', 'Using library.env to securely store and access API keys is recommended, emphasizing the significance of avoiding plain text storage for API keys. 1', 'The underappreciated impact of prompting on AI application development is mentioned. 0']}, {'end': 1549.123, 'start': 1150.808, 'title': 'Prompting-based machine learning workflow', 'summary': 'Discusses the contrast between traditional supervised machine learning and prompting-based machine learning, highlighting the significant reduction in time from months to minutes or hours in building ai applications for unstructured data using prompting, along with its application, limitations, and impact on the workflow.', 'duration': 398.315, 'highlights': ['Prompting-based machine learning allows for the rapid development of AI applications for unstructured data, reducing the time from months to minutes or hours, revolutionizing the speed at which AI applications can be built. The prompting-based machine learning workflow enables the rapid development of AI applications for unstructured data, significantly reducing the time from months to minutes or hours, thereby revolutionizing the speed at which AI applications can be built.', 'The traditional supervised machine learning workflow for building a classifier may take weeks or even a few months to gather labeled data, train, tune, evaluate the model, and deploy it, contrasting with the prompt-based approach, which takes minutes or hours to specify a prompt and have it running using API calls. The traditional supervised machine learning workflow for building a classifier may take weeks or even a few months to gather labeled data, train, tune, evaluate the model, and deploy it. In contrast, the prompt-based approach takes minutes or hours to specify a prompt and have it running using API calls.', 'The rapid development using prompting applies to unstructured data applications, particularly text applications, and may extend to vision applications, but not to structured data applications like machine learning on tabular data with numerical values. The rapid development using prompting applies to unstructured data applications, particularly text applications, and may extend to vision applications, but not to structured data applications like machine learning on tabular data with numerical values.', 'The ability to quickly build AI components is changing the workflow of building entire systems, reducing the time required for this piece of the system, although building the entire system may still take days or weeks. The ability to quickly build AI components is changing the workflow of building entire systems, reducing the time required for this piece of the system, although building the entire system may still take days or weeks.', 'The section emphasizes the importance of classifying customer queries and providing specific instructions based on the classification, demonstrating the use of fixed categories and hard-coded instructions in building a customer service assistant. The section emphasizes the importance of classifying customer queries and providing specific instructions based on the classification, demonstrating the use of fixed categories and hard-coded instructions in building a customer service assistant.', 'The classification of customer inquiries enables the provision of specific instructions, enhancing the handling of different cases and ensuring the quality and safety of the system. The classification of customer inquiries enables the provision of specific instructions, enhancing the handling of different cases and ensuring the quality and safety of the system.']}], 'duration': 583.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE965302.jpg', 'highlights': ['The rapid development using prompting applies to unstructured data applications, particularly text applications, and may extend to vision applications, but not to structured data applications like machine learning on tabular data with numerical values.', 'Using library.env to securely store and access API keys is recommended, emphasizing the significance of avoiding plain text storage for API keys.', 'The importance of checking token usage when exceeding the 4000 token limits of chat GPT is highlighted.', 'The ability to quickly build AI components is changing the workflow of building entire systems, reducing the time required for this piece of the system, although building the entire system may still take days or weeks.']}, {'end': 2483.817, 'segs': [{'end': 1604.64, 'src': 'embed', 'start': 1570.932, 'weight': 0, 'content': [{'end': 1576.915, 'text': "it can be important to first check that people are using the system responsibly and that they're not trying to abuse the system in some way.", 'start': 1570.932, 'duration': 5.983}, {'end': 1580.276, 'text': "In this video, we'll walk through a few strategies to do this.", 'start': 1577.775, 'duration': 2.501}, {'end': 1587.56, 'text': "We'll learn how to moderate content using the OpenAI Moderation API, and also how to use different prompts to detect prompt injections.", 'start': 1580.897, 'duration': 6.663}, {'end': 1588.741, 'text': "So let's dive in.", 'start': 1588.12, 'duration': 0.621}, {'end': 1593.892, 'text': "One effective tool for content moderation is OpenAI's Moderation API.", 'start': 1589.929, 'duration': 3.963}, {'end': 1599.336, 'text': "The Moderation API is designed to ensure content compliance with OpenAI's usage policies,", 'start': 1594.633, 'duration': 4.703}, {'end': 1604.64, 'text': 'and these policies reflect our commitment to ensuring the safe and responsible use of AI technology.', 'start': 1599.336, 'duration': 5.304}], 'summary': 'Strategies for responsible system use and content moderation with openai tools.', 'duration': 33.708, 'max_score': 1570.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE1570932.jpg'}, {'end': 1789.525, 'src': 'embed', 'start': 1750.085, 'weight': 3, 'content': [{'end': 1750.625, 'text': 'the developer.', 'start': 1750.085, 'duration': 0.54}, {'end': 1755.987, 'text': "For example, if you're building a customer service bot designed to answer product-related questions,", 'start': 1751.385, 'duration': 4.602}, {'end': 1761.51, 'text': 'a user might try to inject a prompt that asks the bot to complete their homework or generate a fake news article.', 'start': 1755.987, 'duration': 5.523}, {'end': 1766.032, 'text': 'Prompt injections can lead to unintended AI system usage,', 'start': 1762.691, 'duration': 3.341}, {'end': 1770.693, 'text': "so it's important to detect and prevent them to ensure responsible and cost-effective applications.", 'start': 1766.032, 'duration': 4.661}, {'end': 1772.534, 'text': "We'll go through two strategies.", 'start': 1771.393, 'duration': 1.141}, {'end': 1776.294, 'text': 'The first is using delimiters and clear instructions in the system message,', 'start': 1772.634, 'duration': 3.66}, {'end': 1781.596, 'text': 'and the second is using an additional prompt which asks if the user is trying to carry out a prompt injection.', 'start': 1776.294, 'duration': 5.302}, {'end': 1789.525, 'text': 'So in the example in the slide, the user is asking the system to forget its previous instructions and do something else.', 'start': 1782.677, 'duration': 6.848}], 'summary': 'Prevent prompt injections in ai systems through delimiters and clear instructions, and additional prompts, to ensure responsible and cost-effective applications.', 'duration': 39.44, 'max_score': 1750.085, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE1750085.jpg'}, {'end': 2138.557, 'src': 'embed', 'start': 2115.378, 'weight': 6, 'content': [{'end': 2124.947, 'text': 'so we can reframe the query to request a series of relevant reasoning steps before the model provides a final answer so that it can think longer and more methodically about the problem.', 'start': 2115.378, 'duration': 9.569}, {'end': 2131.693, 'text': 'And in general, we call the strategy of asking the model to reason about a problem in steps Chain of thought reasoning.', 'start': 2125.668, 'duration': 6.025}, {'end': 2138.557, 'text': 'For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user.', 'start': 2132.013, 'duration': 6.544}], 'summary': 'Request reasoning steps to allow longer, methodical thinking. call it chain of thought reasoning.', 'duration': 23.179, 'max_score': 2115.378, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2115378.jpg'}, {'end': 2177.332, 'src': 'embed', 'start': 2150.004, 'weight': 7, 'content': [{'end': 2152.746, 'text': 'In a monologue is a tactic that can be used to mitigate this.', 'start': 2150.004, 'duration': 2.742}, {'end': 2157.109, 'text': "And this is just a fancy way of saying hiding the model's reasoning from the user.", 'start': 2153.406, 'duration': 3.703}, {'end': 2166.829, 'text': 'The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes passing them easy.', 'start': 2158.746, 'duration': 8.083}, {'end': 2172.752, 'text': 'Then, before presenting the output to the user, the output is passed and only part of the output is made visible.', 'start': 2167.59, 'duration': 5.162}, {'end': 2177.332, 'text': 'So remember the classification problem from a previous video,', 'start': 2174.249, 'duration': 3.083}], 'summary': 'Using inner monologue can hide model reasoning from user, making only part of output visible.', 'duration': 27.328, 'max_score': 2150.004, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2150004.jpg'}, {'end': 2427.747, 'src': 'embed', 'start': 2401.434, 'weight': 8, 'content': [{'end': 2408.661, 'text': "And so what we're hoping for is that the model takes all of these different steps and realizes that the user has made an incorrect assumption,", 'start': 2401.434, 'duration': 7.227}, {'end': 2412.826, 'text': 'and then follows the final step to politely correct the user.', 'start': 2408.661, 'duration': 4.165}, {'end': 2421.862, 'text': "And so within this one prompt, we've actually maintained a number of different complex states that the system could be in.", 'start': 2414.997, 'duration': 6.865}, {'end': 2427.747, 'text': 'So at any given point, that could be a different output from the previous step, and we would want to do something different.', 'start': 2421.982, 'duration': 5.765}], 'summary': "Model aims to correct user's incorrect assumptions, maintaining complex states.", 'duration': 26.313, 'max_score': 2401.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2401434.jpg'}], 'start': 1549.743, 'title': 'Ai content moderation and reasoning', 'summary': "Covers content moderation strategies using openai's moderation api for identifying and filtering prohibited content, and discusses avoiding prompt injections in ai systems by using delimiters and additional prompts. it also explores the concept of chain of thought reasoning and demonstrates its effectiveness in guiding a model's response.", 'chapters': [{'end': 1727.205, 'start': 1549.743, 'title': 'Content moderation strategies', 'summary': "Discusses strategies for content moderation, including the use of openai's moderation api to identify and filter prohibited content in various categories such as hate, self-harm, sexual, and violence, ensuring safe and responsible use of ai technology, and providing examples of using the api for monitoring inputs.", 'duration': 177.462, 'highlights': ["OpenAI's Moderation API helps developers identify and filter prohibited content in various categories such as hate, self-harm, sexual, and violence, and it's completely free to use for monitoring inputs and outputs of OpenAI APIs. The Moderation API ensures content compliance and reflects commitment to safe and responsible use of AI technology, offering precise moderation in specific subcategories and being free for monitoring inputs and outputs.", "The chapter discusses strategies for content moderation, including the use of OpenAI's Moderation API to identify and filter prohibited content in various categories such as hate, self-harm, sexual, and violence, ensuring safe and responsible use of AI technology, and providing examples of using the API for monitoring inputs. The chapter covers various strategies for content moderation and highlights the importance of using the Moderation API to ensure responsible use of AI technology, with specific examples of monitoring inputs.", 'The chapter emphasizes the importance of ensuring that users are using the system responsibly and not trying to abuse it in any way, and provides strategies to achieve this. The chapter focuses on the importance of responsible user behavior and offers strategies to prevent system abuse, ensuring a responsible and safe user experience.']}, {'end': 2083.554, 'start': 1728.314, 'title': 'Avoiding prompt injections in ai systems', 'summary': 'Discusses prompt injections in ai systems, strategies to avoid them, and the use of delimiters and additional prompts to prevent prompt injections. it also highlights the importance of detecting and preventing prompt injections to ensure responsible and cost-effective applications.', 'duration': 355.24, 'highlights': ['The chapter discusses prompt injections in AI systems and strategies to avoid them The chapter provides insights into prompt injections in AI systems and strategies to prevent them, emphasizing the importance of detecting and preventing prompt injections.', 'Use of delimiters and clear instructions in the system message to prevent prompt injections The chapter explains the use of delimiters and clear instructions in the system message to prevent prompt injections, ensuring that user attempts to manipulate the AI system are detected and prevented.', 'Importance of detecting and preventing prompt injections to ensure responsible and cost-effective applications It emphasizes the importance of detecting and preventing prompt injections to ensure responsible and cost-effective applications of AI systems.']}, {'end': 2483.817, 'start': 2089.757, 'title': 'Chain of thought reasoning', 'summary': "Discusses the concept of chain of thought reasoning, the tactic of inner monologue, and provides an example of using step-by-step reasoning to guide a model's response to a user query, demonstrating its effectiveness by correcting a user's incorrect assumption.", 'duration': 394.06, 'highlights': ['The chapter discusses the concept of Chain of thought reasoning Explains the importance of allowing a model to reason in detail about a problem before answering a specific question.', "The tactic of inner monologue is a tactic that can be used to mitigate sharing the model's reasoning process with the user Describes the tactic of inner monologue as a way of hiding the model's reasoning from the user, particularly in applications like tutoring where revealing the model's reasoning process could reveal the answer to the student.", "Using step-by-step reasoning to guide a model's response to a user query Details the process of asking the model to reason about the answer before coming to its conclusion through a series of steps, and provides an example of guiding a model through reasoning to correct a user's incorrect assumption."]}], 'duration': 934.074, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE1549743.jpg', 'highlights': ["OpenAI's Moderation API helps identify and filter prohibited content in various categories, ensuring safe and responsible use of AI technology.", 'Strategies for content moderation include using the Moderation API to ensure responsible use of AI technology and providing examples of monitoring inputs.', 'The chapter emphasizes the importance of responsible user behavior and offers strategies to prevent system abuse, ensuring a responsible and safe user experience.', 'The chapter provides insights into prompt injections in AI systems and strategies to prevent them, emphasizing the importance of detecting and preventing prompt injections.', 'Use of delimiters and clear instructions in the system message to prevent prompt injections, ensuring that user attempts to manipulate the AI system are detected and prevented.', 'It emphasizes the importance of detecting and preventing prompt injections to ensure responsible and cost-effective applications of AI systems.', 'The chapter discusses the concept of Chain of thought reasoning, explaining the importance of allowing a model to reason in detail about a problem before answering a specific question.', "Describes the tactic of inner monologue as a way of hiding the model's reasoning from the user, particularly in applications like tutoring where revealing the model's reasoning process could reveal the answer to the student.", "Details the process of asking the model to reason about the answer before coming to its conclusion through a series of steps, and provides an example of guiding a model through reasoning to correct a user's incorrect assumption."]}, {'end': 2874.055, 'segs': [{'end': 2552.69, 'src': 'embed', 'start': 2483.817, 'weight': 0, 'content': [{'end': 2486.919, 'text': "And so let's see another example of a user message.", 'start': 2483.817, 'duration': 3.102}, {'end': 2491.483, 'text': 'And also at this point, feel free to pause the video and try your own messages.', 'start': 2488, 'duration': 3.483}, {'end': 2496.386, 'text': "So let's format this user message.", 'start': 2494.765, 'duration': 1.621}, {'end': 2502.411, 'text': "So the question is, do you sell TVs? And if you remember in our product list, we've only listed different computers.", 'start': 2496.607, 'duration': 5.804}, {'end': 2504.993, 'text': "So let's see what the model says.", 'start': 2503.512, 'duration': 1.481}, {'end': 2513.821, 'text': 'So in this case, step one, the user is asking if the store sells TVs, but TVs are not listed in the available products.', 'start': 2507.455, 'duration': 6.366}, {'end': 2522.408, 'text': 'So as you can see, the model then skips to the response to user step because it realizes that the intermediary steps are not actually necessary.', 'start': 2513.921, 'duration': 8.487}, {'end': 2526.212, 'text': 'I will say that we did ask for the output in this specific format.', 'start': 2523.149, 'duration': 3.063}, {'end': 2529.595, 'text': "So technically, the model hasn't exactly followed our request.", 'start': 2526.252, 'duration': 3.343}, {'end': 2532.257, 'text': 'Again, more advanced models will be better at doing that.', 'start': 2530.256, 'duration': 2.001}, {'end': 2537.886, 'text': "And so in this case, our response to the user is, I'm sorry, but we do not sell TVs at the store.", 'start': 2533.118, 'duration': 4.768}, {'end': 2540.029, 'text': 'And then it lists the available products.', 'start': 2538.507, 'duration': 1.522}, {'end': 2545.358, 'text': 'So again, feel free to try some of your own responses.', 'start': 2543.014, 'duration': 2.344}, {'end': 2550.029, 'text': 'And so now we only really want this part of the response.', 'start': 2546.206, 'duration': 3.823}, {'end': 2552.69, 'text': "We wouldn't wanna show the earlier parts to the user.", 'start': 2550.229, 'duration': 2.461}], 'summary': 'Model skips intermediary steps, responds to user asking about tv availability.', 'duration': 68.873, 'max_score': 2483.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2483817.jpg'}, {'end': 2708.425, 'src': 'embed', 'start': 2675.21, 'weight': 8, 'content': [{'end': 2679.794, 'text': 'And in general, finding the optimal trade-off in prompt complexity requires some experimentation.', 'start': 2675.21, 'duration': 4.584}, {'end': 2684.077, 'text': 'So definitely good to try a number of different prompts before deciding to use one.', 'start': 2680.374, 'duration': 3.703}, {'end': 2693.162, 'text': "And in the next video we'll learn another strategy to handle complex tasks by splitting these complex tasks into a series of simpler subtasks,", 'start': 2685.08, 'duration': 8.082}, {'end': 2695.722, 'text': 'rather than trying to do the whole task in one prompt.', 'start': 2693.162, 'duration': 2.56}, {'end': 2708.425, 'text': "In this video, we'll learn how to split complex tasks into a series of simpler subtasks by chaining multiple prompts together.", 'start': 2701.884, 'duration': 6.541}], 'summary': 'Optimal prompt complexity requires experimentation. try different prompts. split complex tasks into simpler subtasks.', 'duration': 33.215, 'max_score': 2675.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2675210.jpg'}, {'end': 2781.674, 'src': 'embed', 'start': 2754.049, 'weight': 4, 'content': [{'end': 2759.654, 'text': 'Chaining prompts, on the other hand, is like cooking the meal in stages, where you focus on one component at a time,', 'start': 2754.049, 'duration': 5.605}, {'end': 2762.396, 'text': 'ensuring that each part is cooked correctly before moving on to the next.', 'start': 2759.654, 'duration': 2.742}, {'end': 2768.525, 'text': 'This approach breaks down the complexity of the task, making it easier to manage and reducing the likelihood of errors.', 'start': 2763.257, 'duration': 5.268}, {'end': 2773.251, 'text': 'However, this approach might be unnecessary and overcomplicated for a very simple recipe.', 'start': 2769.105, 'duration': 4.146}, {'end': 2781.674, 'text': 'A slightly better analogy for the same thing is the difference between reading spaghetti code with everything in one long file and a simple modular program.', 'start': 2774.366, 'duration': 7.308}], 'summary': 'Chaining prompts simplifies task, reducing errors. like cooking meal in stages.', 'duration': 27.625, 'max_score': 2754.049, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2754049.jpg'}, {'end': 2816.918, 'src': 'embed', 'start': 2789.241, 'weight': 5, 'content': [{'end': 2793.065, 'text': 'The same can be true of a complex single step task submitted to a language model.', 'start': 2789.241, 'duration': 3.824}, {'end': 2802.871, 'text': 'Chaining prompts is a powerful strategy when you have a workflow where you can maintain the state of the system at any given point and take different actions depending on the current state.', 'start': 2793.886, 'duration': 8.985}, {'end': 2810.955, 'text': "And so an example of the current state would be after you've classified an incoming customer query, the state would be the classification.", 'start': 2803.431, 'duration': 7.524}, {'end': 2813.956, 'text': "So it's an account question or it's a product question.", 'start': 2811.035, 'duration': 2.921}, {'end': 2816.918, 'text': 'And then based on the state, you might do something different.', 'start': 2814.857, 'duration': 2.061}], 'summary': 'Chaining prompts is powerful for workflow, based on state classification.', 'duration': 27.677, 'max_score': 2789.241, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2789241.jpg'}, {'end': 2850.883, 'src': 'embed', 'start': 2825.014, 'weight': 6, 'content': [{'end': 2830.516, 'text': 'makes sure the model has all the information it needs to carry out a task and reduces the likelihood of errors.', 'start': 2825.014, 'duration': 5.502}, {'end': 2831.116, 'text': 'as I mentioned,', 'start': 2830.516, 'duration': 0.6}, {'end': 2834.517, 'text': 'This approach can also reduce and lower costs,', 'start': 2831.996, 'duration': 2.521}, {'end': 2840.418, 'text': 'since longer prompts with more tokens cost more to run and outlining all steps might be unnecessary in some cases.', 'start': 2834.517, 'duration': 5.901}, {'end': 2849.383, 'text': 'Another benefit of this approach is that it is also easier to test which steps might be failing more often or to have a human in the loop at a specific step.', 'start': 2841.278, 'duration': 8.105}, {'end': 2850.883, 'text': 'So, to summarize,', 'start': 2849.863, 'duration': 1.02}], 'summary': 'Reducing errors and costs, easier testing, and human oversight are benefits of providing sufficient information to the model.', 'duration': 25.869, 'max_score': 2825.014, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2825014.jpg'}], 'start': 2483.817, 'title': 'User message processing and chaining prompts for complex tasks', 'summary': 'Covers user message processing including handling requests not in the product list, and discusses chaining prompts to simplify complex tasks, reducing errors and costs, with an analogy to cooking and emphasis on system state maintenance.', 'chapters': [{'end': 2552.69, 'start': 2483.817, 'title': 'User message processing', 'summary': 'Discusses the processing of a user message, where a request for a tv is not available in the product list, leading the model to skip intermediary steps and respond appropriately, with improvements expected in advanced models.', 'duration': 68.873, 'highlights': ['The model skips intermediary steps when a requested product is not available, demonstrating efficient processing.', "The response to the user is 'I'm sorry, but we do not sell TVs at the store', showcasing the model's ability to provide accurate information.", 'Advanced models are expected to improve the accuracy of responses and adherence to specific formats.', 'Encouragement for viewers to try their own responses to further understand the user message processing.']}, {'end': 2874.055, 'start': 2553.071, 'title': 'Chaining prompts for complex tasks', 'summary': 'Discusses the strategy of chaining prompts to split complex tasks into simpler subtasks, explaining the benefits of this approach in reducing errors, managing complexity, and lowering costs, while also providing an analogy with cooking and highlighting the importance of maintaining the state of the system.', 'duration': 320.984, 'highlights': ['Chaining prompts is like cooking a meal in stages, reducing complexity, managing errors, and ensuring each part is handled correctly, analogous to reading spaghetti code versus a simple modular program. (relevance: 5)', 'Chaining prompts reduces errors, manages complexity, and lowers costs, making it easier to test failing steps and have a human in the loop at specific points. (relevance: 4)', 'The chapter emphasizes the importance of maintaining the state of the system to carry out tasks and reduce errors, making it easier to manage and ensuring the model has all the necessary information. (relevance: 3)', 'Using one long, complicated instruction can be challenging to manage and keep track of, while chaining prompts simplifies the task and reduces the likelihood of errors. (relevance: 2)', 'The chapter advises trying different prompts to find the optimal trade-off in complexity and mentions the potential convoluted nature of some prompts, encouraging experimentation. (relevance: 1)']}], 'duration': 390.238, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2483817.jpg', 'highlights': ['The model skips intermediary steps when a requested product is not available, demonstrating efficient processing.', "The response to the user is 'I'm sorry, but we do not sell TVs at the store', showcasing the model's ability to provide accurate information.", 'Advanced models are expected to improve the accuracy of responses and adherence to specific formats.', 'Encouragement for viewers to try their own responses to further understand the user message processing.', 'Chaining prompts is like cooking a meal in stages, reducing complexity, managing errors, and ensuring each part is handled correctly, analogous to reading spaghetti code versus a simple modular program.', 'Chaining prompts reduces errors, manages complexity, and lowers costs, making it easier to test failing steps and have a human in the loop at specific points.', 'The chapter emphasizes the importance of maintaining the state of the system to carry out tasks and reduce errors, making it easier to manage and ensuring the model has all the necessary information.', 'Using one long, complicated instruction can be challenging to manage and keep track of, while chaining prompts simplifies the task and reduces the likelihood of errors.', 'The chapter advises trying different prompts to find the optimal trade-off in complexity and mentions the potential convoluted nature of some prompts, encouraging experimentation.']}, {'end': 3402.451, 'segs': [{'end': 2911.907, 'src': 'embed', 'start': 2874.055, 'weight': 0, 'content': [{'end': 2877.679, 'text': 'as these are the cases where it could become hard for the model to reason about what to do.', 'start': 2874.055, 'duration': 3.624}, {'end': 2883.266, 'text': "And as you build with and interact with these models more, you'll gain an intuition for when to use this strategy versus the previous.", 'start': 2877.939, 'duration': 5.327}, {'end': 2892.357, 'text': "And one additional benefit that I didn't mention yet is that it also allows the model to use external tools at certain points of the workflow if necessary.", 'start': 2883.971, 'duration': 8.386}, {'end': 2899.563, 'text': 'For example, it might decide to look something up in a product catalog or call an API or search a knowledge base,', 'start': 2892.858, 'duration': 6.705}, {'end': 2901.565, 'text': 'something that could not be achieved with a single prompt.', 'start': 2899.563, 'duration': 2.002}, {'end': 2904.367, 'text': "So with that, let's dive into an example.", 'start': 2902.605, 'duration': 1.762}, {'end': 2911.907, 'text': "So we're going to use the same example as in the previous video, where we want to answer a customer's question about a specific product,", 'start': 2905.663, 'duration': 6.244}], 'summary': 'Model can use external tools, enhancing workflow efficiency and decision-making.', 'duration': 37.852, 'max_score': 2874.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2874055.jpg'}, {'end': 3189.021, 'src': 'embed', 'start': 3162.672, 'weight': 5, 'content': [{'end': 3169.359, 'text': 'And so the products is just a dictionary from product name to this object that contains the information about the product.', 'start': 3162.672, 'duration': 6.687}, {'end': 3172.876, 'text': 'Notice that each product has a category.', 'start': 3171.275, 'duration': 1.601}, {'end': 3177.197, 'text': 'So remember, we want to look up information about the products that the user asks about.', 'start': 3173.496, 'duration': 3.701}, {'end': 3183.379, 'text': 'So we need to define some helper functions to allow us to look up product information by product name.', 'start': 3177.277, 'duration': 6.102}, {'end': 3189.021, 'text': "So let's create a function, get product by name.", 'start': 3184.56, 'duration': 4.461}], 'summary': 'Creating a function to look up product information by name.', 'duration': 26.349, 'max_score': 3162.672, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3162672.jpg'}, {'end': 3316.56, 'src': 'embed', 'start': 3283.02, 'weight': 3, 'content': [{'end': 3285.062, 'text': "So let's get the product information by name.", 'start': 3283.02, 'duration': 2.042}, {'end': 3290.286, 'text': "So here you can see we've just fetched all of the product information.", 'start': 3287.604, 'duration': 2.682}, {'end': 3296.256, 'text': "And let's do an example to get all of the products for a category.", 'start': 3292.935, 'duration': 3.321}, {'end': 3300.817, 'text': "So let's get all of the products in the computers and laptops category.", 'start': 3296.356, 'duration': 4.461}, {'end': 3306.838, 'text': 'So here you see we fetched all of the products with this category.', 'start': 3302.737, 'duration': 4.101}, {'end': 3316.56, 'text': "So Let's continue our example.", 'start': 3312.72, 'duration': 3.84}], 'summary': 'Fetched product information by name and category.', 'duration': 33.54, 'max_score': 3283.02, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3283020.jpg'}], 'start': 2874.055, 'title': 'Model reasoning and product categorization', 'summary': 'Discusses strategies for model reasoning, enabling the use of external tools and breaking down tasks into prompts. it also explains the process of product categorization and information retrieval, emphasizing structured responses and providing categorization and retrieval function examples.', 'chapters': [{'end': 2936.175, 'start': 2874.055, 'title': 'Model reasoning and external tools', 'summary': 'Discusses the strategy of breaking down tasks into multiple prompts to aid model reasoning and enable the use of external tools, allowing for tasks such as looking up information in a product catalog or knowledge base, and calling apis, which cannot be achieved with a single prompt.', 'duration': 62.12, 'highlights': ['Breaking down tasks into multiple prompts aids model reasoning and enables the use of external tools, such as looking up information in a product catalog or knowledge base, and calling APIs.', 'This approach allows the model to gain intuition for when to use this strategy and also provides the flexibility to use external tools at certain points of the workflow if necessary.', 'Using multiple prompts also benefits in scenarios where interacting with models becomes more complex, as it helps in reasoning about what to do in such cases.']}, {'end': 3402.451, 'start': 2937.075, 'title': 'Product categorization and information retrieval', 'summary': 'Explains the process of categorizing customer queries and retrieving information about products, emphasizing the importance of structured response and providing examples of categorization and information retrieval functions.', 'duration': 465.376, 'highlights': ['The chapter explains the process of categorizing customer queries and retrieving information about products The main focus of the chapter, providing an overview of the key tasks involved.', 'Emphasizes the importance of structured response Highlighting the benefit of structured response for further processing and readability.', 'Providing examples of categorization and information retrieval functions Demonstrating the practical implementation of helper functions to categorize and retrieve product information.']}], 'duration': 528.396, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE2874055.jpg', 'highlights': ['Breaking down tasks into multiple prompts aids model reasoning and enables the use of external tools, such as looking up information in a product catalog or knowledge base, and calling APIs.', 'Using multiple prompts also benefits in scenarios where interacting with models becomes more complex, as it helps in reasoning about what to do in such cases.', 'This approach allows the model to gain intuition for when to use this strategy and also provides the flexibility to use external tools at certain points of the workflow if necessary.', 'The chapter explains the process of categorizing customer queries and retrieving information about products The main focus of the chapter, providing an overview of the key tasks involved.', 'Emphasizes the importance of structured response Highlighting the benefit of structured response for further processing and readability.', 'Providing examples of categorization and information retrieval functions Demonstrating the practical implementation of helper functions to categorize and retrieve product information.']}, {'end': 4543.835, 'segs': [{'end': 3494.373, 'src': 'embed', 'start': 3452.427, 'weight': 0, 'content': [{'end': 3458.528, 'text': 'And so to do this, we need to put the product information into a nice string format that we can add to the prompt.', 'start': 3452.427, 'duration': 6.101}, {'end': 3460.849, 'text': "And so let's also create a helper function to do this.", 'start': 3458.648, 'duration': 2.201}, {'end': 3469.677, 'text': "So we're going to call it generate output string, and it's going to take in the list of data that we just created.", 'start': 3463.154, 'duration': 6.523}, {'end': 3475.44, 'text': "So this, and then I'm going to copy in some code and then we'll walk through what it's doing.", 'start': 3469.697, 'duration': 5.743}, {'end': 3481.062, 'text': "So now I'm going to paste in some code and show you an example, and then we'll talk about what this function is doing.", 'start': 3476.44, 'duration': 4.622}, {'end': 3487.361, 'text': "So We're going to get the product information from our first user message.", 'start': 3481.963, 'duration': 5.398}, {'end': 3494.373, 'text': "And so we're going to use this helper function, generate output string on our category and product list, which if we remember was this.", 'start': 3487.902, 'duration': 6.471}], 'summary': 'Creating a helper function to generate output string for product information.', 'duration': 41.946, 'max_score': 3452.427, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3452427.jpg'}, {'end': 3587.709, 'src': 'embed', 'start': 3560.475, 'weight': 3, 'content': [{'end': 3565.36, 'text': "So at this point, we've found the relevant product information to answer the user question.", 'start': 3560.475, 'duration': 4.885}, {'end': 3568.043, 'text': "Now it's time for the model to actually answer the question.", 'start': 3565.7, 'duration': 2.343}, {'end': 3571.987, 'text': "So let's have our system message.", 'start': 3568.643, 'duration': 3.344}, {'end': 3573.809, 'text': 'So this is the instruction.', 'start': 3572.147, 'duration': 1.662}, {'end': 3576.872, 'text': "You're a customer service assistant for a large electronic store.", 'start': 3574.289, 'duration': 2.583}, {'end': 3582.205, 'text': "Respond in a friendly and helpful tone with let's say, with very concise answers.", 'start': 3577.172, 'duration': 5.033}, {'end': 3584.987, 'text': 'Make sure to ask the user relevant follow-up questions.', 'start': 3582.645, 'duration': 2.342}, {'end': 3587.709, 'text': 'So we want this to be an interactive experience for the user.', 'start': 3585.007, 'duration': 2.702}], 'summary': 'System to provide concise, friendly responses and ask follow-up questions for interactive user experience.', 'duration': 27.234, 'max_score': 3560.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3560475.jpg'}, {'end': 3699.726, 'src': 'embed', 'start': 3669.656, 'weight': 2, 'content': [{'end': 3673.297, 'text': 'So, as you can see, by breaking this up into a series of steps,', 'start': 3669.656, 'duration': 3.641}, {'end': 3680.819, 'text': 'we were able to load information relevant to the user query to give the model the relevant context it needed to answer the question effectively.', 'start': 3673.297, 'duration': 7.522}, {'end': 3688.681, 'text': 'So you might be wondering why are we selectively loading product descriptions into the prompt instead of including all of them and letting the model use the information it needs?', 'start': 3681.279, 'duration': 7.402}, {'end': 3699.726, 'text': "And so what I mean by this is why didn't we just include all of this product information in the prompt and we wouldn't have to bother with all of those intermediate steps to actually look up the product information?", 'start': 3689.301, 'duration': 10.425}], 'summary': 'Breaking down the process into steps provided relevant context for effective question answering.', 'duration': 30.07, 'max_score': 3669.656, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3669656.jpg'}, {'end': 3857.587, 'src': 'embed', 'start': 3832.566, 'weight': 5, 'content': [{'end': 3837.892, 'text': 'One of the key advantages of using text embeddings is that they enable fuzzy or semantic search,', 'start': 3832.566, 'duration': 5.326}, {'end': 3841.395, 'text': 'which allows you to find relevant information without using the exact keywords.', 'start': 3837.892, 'duration': 3.503}, {'end': 3849.324, 'text': "So in our example, we wouldn't necessarily need the exact name of the product, but we could do a more a search with a more general query,", 'start': 3841.956, 'duration': 7.368}, {'end': 3850.405, 'text': 'like mobile phone.', 'start': 3849.324, 'duration': 1.081}, {'end': 3856.625, 'text': "We're planning to create a comprehensive course on how to use embeddings for various applications soon.", 'start': 3851.957, 'duration': 4.668}, {'end': 3857.587, 'text': 'So stay tuned.', 'start': 3856.765, 'duration': 0.822}], 'summary': 'Text embeddings enable fuzzy or semantic search, facilitating relevant information retrieval without exact keywords. a comprehensive course on using embeddings for various applications is planned.', 'duration': 25.021, 'max_score': 3832.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3832566.jpg'}, {'end': 3903.531, 'src': 'embed', 'start': 3874.928, 'weight': 6, 'content': [{'end': 3879.55, 'text': 'Checking outputs before showing them to users can be important for ensuring the quality,', 'start': 3874.928, 'duration': 4.622}, {'end': 3884.053, 'text': 'relevance and safety of the responses provided to them or used in automation flows.', 'start': 3879.55, 'duration': 4.503}, {'end': 3888.236, 'text': "We'll learn how to use the Moderation API, but this time for outputs,", 'start': 3884.714, 'duration': 3.522}, {'end': 3892.538, 'text': 'and how to use additional prompts to the model to evaluate output quality before displaying them.', 'start': 3888.236, 'duration': 4.302}, {'end': 3895.08, 'text': "So let's dive into the examples.", 'start': 3893.699, 'duration': 1.381}, {'end': 3899.603, 'text': "We've already discussed the Moderation API in the context of evaluating inputs.", 'start': 3895.901, 'duration': 3.702}, {'end': 3903.531, 'text': "Now let's revisit it in the context of checking outputs.", 'start': 3900.41, 'duration': 3.121}], 'summary': 'Using moderation api to evaluate output quality and safety before display.', 'duration': 28.603, 'max_score': 3874.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3874928.jpg'}, {'end': 4267.094, 'src': 'embed', 'start': 4238.703, 'weight': 7, 'content': [{'end': 4242.984, 'text': "In the next video we're going to put together everything we've learned in the evaluate input section,", 'start': 4238.703, 'duration': 4.281}, {'end': 4246.825, 'text': 'process section and checking output section to build an end-to-end system.', 'start': 4242.984, 'duration': 3.841}, {'end': 4257.972, 'text': "In this video, we'll put together everything we've learned in the previous videos to create an end-to-end example of a customer service assistant.", 'start': 4252.031, 'duration': 5.941}, {'end': 4259.713, 'text': "We'll go through the following steps.", 'start': 4258.392, 'duration': 1.321}, {'end': 4262.913, 'text': "First, we'll check the input to see if it flags the moderation API.", 'start': 4259.993, 'duration': 2.92}, {'end': 4267.094, 'text': "Second, if it doesn't, we'll extract the list of products.", 'start': 4263.913, 'duration': 3.181}], 'summary': 'Combining input, processing, and output to build an end-to-end system for a customer service assistant.', 'duration': 28.391, 'max_score': 4238.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE4238703.jpg'}], 'start': 3403.072, 'title': 'Building a customer service assistant', 'summary': 'Covers generating product information list, providing relevant product information to the model, using text embeddings for efficient knowledge retrieval, and building an end-to-end example of a customer service assistant.', 'chapters': [{'end': 3559.355, 'start': 3403.072, 'title': 'Generating product information list', 'summary': 'Covers generating a list of product information from user messages to aid the model in answering user questions, using helper functions and looping through product data.', 'duration': 156.283, 'highlights': ['Aiming to transform product information into a list to assist the model in answering user queries.', "Creating a helper function 'generate output string' to format the product information into a string for use in the prompt.", "Utilizing the 'generate output string' function to extract and organize product information from user messages, aiding the model in answering user questions."]}, {'end': 3812.546, 'start': 3560.475, 'title': 'Model answering user query with product information', 'summary': 'Discusses the process of providing relevant product information to the model to effectively answer user queries, highlighting the importance of selective loading, context limitations, and cost reduction in generating responses.', 'duration': 252.071, 'highlights': ['The model provides concise answers using relevant product information and asks follow-up questions to create an interactive user experience. The model delivers concise answers about Smart X Pro phone, Photosnap camera, and different televisions in stock, creating an interactive user experience.', "Selective loading of product information is essential to reduce the cost of generating responses and overcome language models' context limitations. Selective loading of product information reduces the cost of generating responses and overcomes language models' context limitations related to a fixed number of tokens allowed as input and output.", 'Language models, like GPT-4, are capable of ignoring irrelevant information when the context is well-structured, but context limitations and cost considerations make selective loading important. Advanced language models like GPT-4 can ignore irrelevant information in well-structured contexts, but context limitations and cost considerations make selective loading important.']}, {'end': 4237.842, 'start': 3813.43, 'title': 'Text embeddings for information retrieval', 'summary': 'Discusses the use of text embeddings for efficient knowledge retrieval, including enabling fuzzy or semantic search and the importance of checking outputs for quality, relevance and safety using the moderation api and asking the model to evaluate its own output.', 'duration': 424.412, 'highlights': ['The chapter discusses the use of text embeddings for efficient knowledge retrieval, including enabling fuzzy or semantic search. Text embeddings enable fuzzy or semantic search, allowing retrieval of relevant information without exact keywords, improving search efficiency.', 'The importance of checking outputs for quality, relevance, and safety using the Moderation API and asking the model to evaluate its own output. Checking outputs using the Moderation API is crucial for ensuring quality, relevance, and safety of responses, while asking the model to evaluate its own output provides immediate feedback for response quality.', 'The possibility of experimenting with generating multiple model responses per user query and then having the model choose the best one to show the user. Experimenting with generating multiple model responses per user query and selecting the best one offers the potential for improving user experience and response quality.']}, {'end': 4543.835, 'start': 4238.703, 'title': 'Building a customer service assistant', 'summary': 'Demonstrates building an end-to-end example of a customer service assistant, covering steps such as checking input, extracting products, looking them up, answering user questions, and running responses through a moderation api, with a python chatbot ui.', 'duration': 305.132, 'highlights': ['The chapter demonstrates the end-to-end process of building a customer service assistant, including steps such as checking input, extracting products, looking them up, answering user questions, and running responses through a moderation API, using a Python chatbot UI.', 'The chapter covers interacting with a customer service assistant, involving steps like asking about available TVs, inquiring about the cheapest and most expensive options, and seeking more information about a specific product.', 'The chapter discusses the process user message function for the customer service assistant, which involves checking input against a moderation API, extracting product lists, looking up products, and running responses through the moderation API.']}], 'duration': 1140.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE3403072.jpg', 'highlights': ["Utilizing the 'generate output string' function to extract and organize product information from user messages, aiding the model in answering user questions.", "Creating a helper function 'generate output string' to format the product information into a string for use in the prompt.", 'Aiming to transform product information into a list to assist the model in answering user queries.', 'The model provides concise answers using relevant product information and asks follow-up questions to create an interactive user experience.', "Selective loading of product information is essential to reduce the cost of generating responses and overcome language models' context limitations.", 'Text embeddings enable fuzzy or semantic search, allowing retrieval of relevant information without exact keywords, improving search efficiency.', 'The importance of checking outputs for quality, relevance, and safety using the Moderation API and asking the model to evaluate its own output.', 'The chapter demonstrates the end-to-end process of building a customer service assistant, including steps such as checking input, extracting products, looking them up, answering user questions, and running responses through a moderation API, using a Python chatbot UI.']}, {'end': 5645.311, 'segs': [{'end': 4566.085, 'src': 'embed', 'start': 4543.835, 'weight': 0, 'content': [{'end': 4551.999, 'text': "we've combined the techniques we've learned throughout the course to create a comprehensive system with a chain of steps that evaluates user inputs,", 'start': 4543.835, 'duration': 8.164}, {'end': 4554.019, 'text': 'processes them and then checks the output.', 'start': 4551.999, 'duration': 2.02}, {'end': 4561.563, 'text': 'By monitoring the quality of the system across a larger number of inputs, you can alter the steps and improve the overall performance of your system.', 'start': 4554.84, 'duration': 6.723}, {'end': 4566.085, 'text': 'Maybe we might find that our prompts could be better for some of the steps.', 'start': 4562.423, 'duration': 3.662}], 'summary': 'Combined techniques to create a comprehensive system for evaluating user inputs and improving overall performance.', 'duration': 22.25, 'max_score': 4543.835, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE4543835.jpg'}, {'end': 4726.769, 'src': 'embed', 'start': 4699.473, 'weight': 1, 'content': [{'end': 4702.817, 'text': 'because you can now get this working with zero training examples.', 'start': 4699.473, 'duration': 3.344}, {'end': 4708.801, 'text': 'So when building an application using an LLM, this is what it often feels like.', 'start': 4703.817, 'duration': 4.984}, {'end': 4718.308, 'text': 'First you would tune the prompts on just a small handful of examples maybe one to three to five examples and try to get a prompt that works on them.', 'start': 4709.181, 'duration': 9.127}, {'end': 4723.087, 'text': 'And then, as you have the system, undergo additional testing.', 'start': 4719.324, 'duration': 3.763}, {'end': 4726.769, 'text': 'you occasionally run into a few examples that are tricky.', 'start': 4723.087, 'duration': 3.682}], 'summary': 'Llm can work with zero training examples, tuning prompts on a small handful, and encountering occasional tricky examples during testing.', 'duration': 27.296, 'max_score': 4699.473, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE4699473.jpg'}, {'end': 5107.406, 'src': 'embed', 'start': 5077.321, 'weight': 2, 'content': [{'end': 5082.105, 'text': 'until the prompt is retrieving the relevant products and categories to the customer.', 'start': 5077.321, 'duration': 4.784}, {'end': 5085.928, 'text': 'requests for all of your prompts, all three of them in this example.', 'start': 5082.105, 'duration': 3.823}, {'end': 5092.733, 'text': 'And if the prompt had been missing some products or something,', 'start': 5089.13, 'duration': 3.603}, {'end': 5097.837, 'text': 'then what we would do is probably go back to edit the prompt a few times until it gets it right on all three of these prompts.', 'start': 5092.733, 'duration': 5.104}, {'end': 5107.406, 'text': "After you've gotten the system to this point, you might then start running the system in testing.", 'start': 5100.76, 'duration': 6.646}], 'summary': 'System retrieves products and categories for customer prompts, editing until accuracy, then moves to testing.', 'duration': 30.085, 'max_score': 5077.321, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE5077321.jpg'}, {'end': 5606.057, 'src': 'embed', 'start': 5580.33, 'weight': 3, 'content': [{'end': 5587.237, 'text': 'And so if you were to tune the prompts, you can rerun this to see if the percent correct goes up or down.', 'start': 5580.33, 'duration': 6.907}, {'end': 5594.585, 'text': 'What you just saw in this notebook is going through steps one, two, and three of this bulleted list.', 'start': 5587.898, 'duration': 6.687}, {'end': 5602.453, 'text': 'And this already gives a pretty good development set of 10 examples with which to tune and validate the prompt is working.', 'start': 5594.945, 'duration': 7.508}, {'end': 5606.057, 'text': 'If you needed an additional level of rigor,', 'start': 5603.294, 'duration': 2.763}], 'summary': 'Testing prompts can improve accuracy, as seen in 10 examples.', 'duration': 25.727, 'max_score': 5580.33, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE5580330.jpg'}], 'start': 4543.835, 'title': 'System performance evaluation and best practices', 'summary': 'Discusses creating a comprehensive system for user inputs, process evaluation, and improving system performance, along with unique methods for evaluating the outputs of a language model and prompt tuning for product retrieval, achieving 90% correctness in the development set.', 'chapters': [{'end': 4609.902, 'start': 4543.835, 'title': 'System performance evaluation', 'summary': 'Discusses creating a comprehensive system with a chain of steps to evaluate user inputs, process them, and improve system performance by monitoring and altering the steps based on the quality of the system across a larger number of inputs.', 'duration': 66.067, 'highlights': ['By monitoring the quality of the system across a larger number of inputs, you can alter the steps and improve the overall performance of your system.', 'Isa showed how to build an application using an LLM, from evaluating the inputs to processing the inputs, to then doing final output checking before showing it to a user.', "Discussing the need to track the system's performance, find shortcomings, and continue to improve the quality of the answers in the system."]}, {'end': 4926.496, 'start': 4609.902, 'title': 'Best practices for llm evaluation', 'summary': 'Discusses the unique methods of evaluating the outputs of a language model, highlighting the process of building an application using an llm, the iterative process of tuning prompts, and the importance of test sets for high-stakes applications.', 'duration': 316.594, 'highlights': ['The process of building an application using an LLM involves iteratively tuning prompts on a small handful of examples and gradually adding additional test examples opportunistically, leading to the development of metrics to measure performance on a small set of examples. Iterative prompt tuning, gradual addition of test examples, development of performance metrics', 'The importance of test sets for high-stakes applications is emphasized, with a recommendation to rigorously evaluate the performance of the system before use to mitigate the risk of harm or bias. Emphasis on test sets for high-stakes applications, rigorous evaluation for risk mitigation', "For applications with modest risk of harm, it is suggested to stop the evaluation process early without the need for collecting larger datasets, highlighting the flexibility in the evaluation process based on the application's risk profile. Flexibility in evaluation process, stopping early for low-risk applications"]}, {'end': 5379.866, 'start': 4927.517, 'title': 'Prompt tuning for product retrieval', 'summary': 'Discusses the process of tuning a prompt for product retrieval, utilizing few-shot prompting to refine the output, and emphasizing the need to avoid extraneous text by modifying the prompt, leading to improved accuracy in generating relevant product categories and details.', 'duration': 452.349, 'highlights': ['The process of tuning a prompt for product retrieval is explained, involving the utilization of few-shot prompting for refinement. The chapter emphasizes the iterative process of tuning a prompt by providing examples and continuously refining it until it yields appropriate outputs, using a few-shot prompting technique for this purpose.', 'The need to avoid extraneous text in the output is highlighted, leading to the modification of the prompt to specify output in JSON format, resulting in improved accuracy. Emphasizing the necessity to avoid extraneous text in the output, the prompt is modified to specify output only in JSON format, leading to a more accurate and relevant output.', 'Automation of the testing process is recommended as the development set grows beyond a small number of examples, to ensure accurate and efficient evaluation of prompt outputs. As the development set grows, the chapter recommends the automation of the testing process to efficiently evaluate prompt outputs, ensuring accuracy and streamlining the validation process.']}, {'end': 5645.311, 'start': 5381.4, 'title': 'Evaluating prompt performance', 'summary': "Discusses evaluating the prompt's performance using a set of 10 examples, achieving a 90% correctness in the development set, and the potential for further validation with additional examples or a holdout test set.", 'duration': 263.911, 'highlights': ["Evaluating Prompt Performance The chapter focuses on evaluating the prompt's performance using a set of 10 examples.", "90% Correctness in Development Set The evaluation results in a 90% correctness in the development set, indicating the prompt's effectiveness in providing the desired response.", 'Potential for Further Validation The discussion mentions the potential for further validation with additional examples or a holdout test set, emphasizing the importance of rigor in certain applications.']}], 'duration': 1101.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE4543835.jpg', 'highlights': ['By monitoring the quality of the system across a larger number of inputs, you can alter the steps and improve the overall performance of your system.', 'The process of building an application using an LLM involves iteratively tuning prompts on a small handful of examples and gradually adding additional test examples opportunistically, leading to the development of metrics to measure performance on a small set of examples.', 'The process of tuning a prompt for product retrieval is explained, involving the utilization of few-shot prompting for refinement. The chapter emphasizes the iterative process of tuning a prompt by providing examples and continuously refining it until it yields appropriate outputs, using a few-shot prompting technique for this purpose.', "Evaluating Prompt Performance The chapter focuses on evaluating the prompt's performance using a set of 10 examples.", "90% Correctness in Development Set The evaluation results in a 90% correctness in the development set, indicating the prompt's effectiveness in providing the desired response."]}, {'end': 6352.916, 'segs': [{'end': 5676.486, 'src': 'embed', 'start': 5645.311, 'weight': 1, 'content': [{'end': 5652.937, 'text': 'then of course it would be the responsible thing to do to actually get a much larger test set to really verify the performance before you use it anywhere.', 'start': 5645.311, 'duration': 7.626}, {'end': 5654.374, 'text': "So that's it.", 'start': 5653.673, 'duration': 0.701}, {'end': 5664.079, 'text': 'I find that the workflow of building applications using prompts is very different than a workflow of building applications using supervised learning,', 'start': 5654.594, 'duration': 9.485}, {'end': 5667.121, 'text': 'and the pace of iteration feels much faster.', 'start': 5664.079, 'duration': 3.042}, {'end': 5676.486, 'text': 'If you have not yet done it before, you might be surprised at how well an evaluation method built on just a few hand-curated tricky examples.', 'start': 5667.181, 'duration': 9.305}], 'summary': 'Using prompts for application development feels faster than supervised learning, requiring a larger test set for performance verification.', 'duration': 31.175, 'max_score': 5645.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE5645311.jpg'}, {'end': 5709.224, 'src': 'embed', 'start': 5684.493, 'weight': 0, 'content': [{'end': 5695.849, 'text': 'how effective adding a handful just a handful of tricky examples into your development sets might be in terms of helping you and your team get to an effective set of prompts in an effective system.', 'start': 5684.493, 'duration': 11.356}, {'end': 5704.239, 'text': 'In this video, the outputs could be evaluated quantitatively, as in.', 'start': 5697.754, 'duration': 6.485}, {'end': 5709.224, 'text': 'there was a desired output and you could tell if it gave this desired output or not.', 'start': 5704.239, 'duration': 4.985}], 'summary': 'Adding a handful of tricky examples can help in evaluating outputs quantitatively.', 'duration': 24.731, 'max_score': 5684.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE5684493.jpg'}, {'end': 5958.179, 'src': 'embed', 'start': 5932.349, 'weight': 3, 'content': [{'end': 5942.156, 'text': 'because even if you deploy 3.5 Turbo in production and generate a lot of text, if your evaluation is a more sporadic exercise,', 'start': 5932.349, 'duration': 9.807}, {'end': 5952.59, 'text': 'then it may be prudent to pay for the somewhat more expensive GPT-4 API call to get a more rigorous evaluation of the output.', 'start': 5942.156, 'duration': 10.434}, {'end': 5958.179, 'text': 'One design pattern that I hope you can take away from this is that we can specify a rubric,', 'start': 5953.612, 'duration': 4.567}], 'summary': 'Consider using gpt-4 api for rigorous evaluation of 3.5 turbo output.', 'duration': 25.83, 'max_score': 5932.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE5932349.jpg'}, {'end': 6269.27, 'src': 'embed', 'start': 6236.226, 'weight': 4, 'content': [{'end': 6246.149, 'text': "First is even without an expert provided ideal answer, if you can write a rubric, you can use one LLM to evaluate another LLM's output.", 'start': 6236.226, 'duration': 9.923}, {'end': 6251.253, 'text': 'Second, if you can provide an expert provided ideal answer,', 'start': 6247.209, 'duration': 4.044}, {'end': 6261.603, 'text': 'then that can help your LLM better compare if a specific assistant output is similar to the expert provided ideal answer.', 'start': 6251.253, 'duration': 10.35}, {'end': 6269.27, 'text': 'I hope that helps you to evaluate your LLM systems outputs so that,', 'start': 6261.623, 'duration': 7.647}], 'summary': 'Rubric can evaluate llm output, expert answer helps compare outputs.', 'duration': 33.044, 'max_score': 6236.226, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE6236226.jpg'}], 'start': 5645.311, 'title': 'Building effective applications using prompts and evaluating llm outputs', 'summary': 'Discusses the impact of adding tricky examples to the development set, emphasizing the surprising effectiveness of a small number of examples and provides quantitative assessment. it also explores evaluating llm outputs in ambiguous settings, considering the use of rubrics and comparing llm outputs to expert human-written responses, achieving a subset score of a and a disagreement score of d.', 'chapters': [{'end': 5709.224, 'start': 5645.311, 'title': 'Building effective applications using prompts', 'summary': 'Discusses the effectiveness of adding hand-curated tricky examples to the development set, highlighting the significant impact on the pace of iteration and the surprising effectiveness of a small number of examples. the evaluation method and the quantitative assessment of outputs are also emphasized.', 'duration': 63.913, 'highlights': ['The workflow of building applications using prompts differs significantly from building applications using supervised learning, with a much faster pace of iteration.', 'Adding a handful of tricky examples to the development sets can surprisingly be highly effective in helping teams create an effective set of prompts and a robust system.', 'A small number of hand-curated tricky examples can be surprisingly effective, even though statistically not valid for almost anything, in influencing the development of prompts.', 'The chapter emphasizes the importance of verifying the performance of models with a much larger test set before practical use.']}, {'end': 5980.789, 'start': 5709.724, 'title': 'Evaluating llm output', 'summary': 'Explores evaluating llm output in ambiguous settings, discussing the use of rubrics to assess the assistant answer and considering the potential use of gpt-4 for more robust evaluations.', 'duration': 271.065, 'highlights': ['The chapter discusses evaluating LLM output in ambiguous settings, focusing on the use of rubrics to assess the assistant answer.', 'It explains the process of creating a rubric to evaluate the assistant answer, specifying criteria such as factual content, style, grammar, and information coherence.', 'The potential use of GPT-4 for more robust evaluations is considered, highlighting the importance of rigorous evaluation, especially for production deployments.']}, {'end': 6352.916, 'start': 5981.29, 'title': 'Evaluating llm outputs', 'summary': "Discusses using a rubric to evaluate llm outputs, comparing them to expert human-written responses, and provides examples of an llm's evaluation using a rubric, achieving a subset score of a and a disagreement score of d, illustrating the importance of expert-provided ideal answers in evaluating llm outputs.", 'duration': 371.626, 'highlights': ["The chapter discusses using a rubric to evaluate LLM outputs, comparing them to expert human-written responses, and provides examples of an LLM's evaluation using a rubric, achieving a subset score of A and a disagreement score of D, illustrating the importance of expert-provided ideal answers in evaluating LLM outputs.", 'The chapter explains the rubric used for evaluating LLM outputs, instructing the LLM to compare the factual content of the submitted answer with the expert answer and outputting a score from A to E, depending on the consistency and agreement between the two answers.', "The chapter emphasizes the importance of providing an expert-provided ideal answer to help the LLM better compare if a specific assistant output is similar to the expert-provided ideal answer, ultimately leading to continuous evaluation and improvement of the LLM's performance."]}], 'duration': 707.605, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/C48GywChLeE/pics/C48GywChLeE5645311.jpg', 'highlights': ['A small number of hand-curated tricky examples can be surprisingly effective in influencing the development of prompts.', 'The workflow of building applications using prompts differs significantly from building applications using supervised learning, with a much faster pace of iteration.', 'The chapter emphasizes the importance of verifying the performance of models with a much larger test set before practical use.', 'The potential use of GPT-4 for more robust evaluations is considered, highlighting the importance of rigorous evaluation, especially for production deployments.', "The chapter emphasizes the importance of providing an expert-provided ideal answer to help the LLM better compare if a specific assistant output is similar to the expert-provided ideal answer, ultimately leading to continuous evaluation and improvement of the LLM's performance."]}], 'highlights': ['The chapter discusses best practices for building a complex application using an LLM, focusing on an end-to-end customer service assistance system.', 'Supervised learning is a core building block for training large language models by using it to repeatedly predict the next word.', 'Transition from base LLM to instruction-tuned LLM is significantly faster and requires fewer computational resources.', 'Human ratings are utilized to enhance the quality of LLM outputs.', 'The rapid development using prompting applies to unstructured data applications, particularly text applications, and may extend to vision applications.', "OpenAI's Moderation API helps identify and filter prohibited content in various categories, ensuring safe and responsible use of AI technology.", 'The model skips intermediary steps when a requested product is not available, demonstrating efficient processing.', 'Breaking down tasks into multiple prompts aids model reasoning and enables the use of external tools, such as looking up information in a product catalog or knowledge base, and calling APIs.', "Utilizing the 'generate output string' function to extract and organize product information from user messages, aiding the model in answering user questions.", 'By monitoring the quality of the system across a larger number of inputs, you can alter the steps and improve the overall performance of your system.', 'The process of tuning a prompt for product retrieval is explained, involving the utilization of few-shot prompting for refinement.', 'A small number of hand-curated tricky examples can be surprisingly effective in influencing the development of prompts.', 'The workflow of building applications using prompts differs significantly from building applications using supervised learning, with a much faster pace of iteration.', 'The potential use of GPT-4 for more robust evaluations is considered, highlighting the importance of rigorous evaluation, especially for production deployments.']}