title
20. LLMOps| Andrew Ng | DeepLearning.ai - Full Course

description
The course comes from [https://learn.deeplearning.ai/llmops/lesson/1/introduction](https://learn.deeplearning.ai/microsoft-semantic-kernel/lesson/1/introduction) created by Andrew Ng This course introduces the key concepts and best practices of LLM Ops (Large Language Model Operations). LLM Ops is an extension of machine learning operations for applications built on large language models, aiming to automate the cycle of data preparation, model tuning, deployment, maintenance, and monitoring. In this course, you will learn how to prepare data for large language models and perform model tuning using techniques like Parameter Efficient Fine-tuning (PEP). You will also learn how to automate and orchestrate multiple LLM tuning workflows, including handling massive text datasets. The course also covers methods for deploying models as REST APIs and making API calls, as well as techniques for building an overall LLM Ops workflow. You will learn how to use SQL for data preparation, leverage open-source tools for orchestration and automation, and deploy models to production using Vertex AI. By the end of this course, you will be able to manage and operate AI production use cases more efficiently, without needing prior knowledge of MLOps.

detail
{'title': '20. LLMOps| Andrew Ng | DeepLearning.ai - Full Course', 'heatmap': [], 'summary': 'Course by andrew ng covers the challenges and best practices for developing and managing large language model (llm) applications, emphasizing the need for automated processes, efficient ai production management, mlops workflows, data preparation, model tuning, and ml model deployment.', 'chapters': [{'end': 112.296, 'segs': [{'end': 49.123, 'src': 'embed', 'start': 1.96, 'weight': 0, 'content': [{'end': 8.128, 'text': 'Welcome to LLM Ops, built in partnership with Google Cloud and taught by Irvin Hausenaar.', 'start': 1.96, 'duration': 6.168}, {'end': 16.52, 'text': 'Say you design and deploy one LLM-based use case, such as summarizing customer emails, and it takes you a few weeks to do that.', 'start': 9.25, 'duration': 7.27}, {'end': 25.07, 'text': 'That probably took a bunch of work to select the large language model to build on, which might mean trying a few to see what works best,', 'start': 17.708, 'duration': 7.362}, {'end': 30.892, 'text': 'then tuning the prompts and setting up an evaluation framework, deploying and then monitoring performance.', 'start': 25.07, 'duration': 5.822}, {'end': 38.015, 'text': 'As you build an application, having automated ways to deploy and monitor will make your life easier.', 'start': 31.473, 'duration': 6.542}, {'end': 46.821, 'text': 'And if you need to update this application, maybe because the LLM provider is deprecating the model you had built on,', 'start': 38.955, 'duration': 7.866}, {'end': 49.123, 'text': 'and so you need to switch to a new LLM.', 'start': 46.821, 'duration': 2.302}], 'summary': 'Llm ops, in partnership with google cloud, teaches how to design and deploy llm-based use cases, taking a few weeks, involving model selection, tuning, deployment, and monitoring.', 'duration': 47.163, 'max_score': 1.96, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1960.jpg'}, {'end': 91.811, 'src': 'embed', 'start': 69.21, 'weight': 1, 'content': [{'end': 76.935, 'text': 'LMOps, which is an extension of MLOps or machine learning operations to applications built on large language models,', 'start': 69.21, 'duration': 7.725}, {'end': 85.941, 'text': 'describe the processes and tools to automate the cycle of data preparation, model tuning, deployment, maintenance and monitoring.', 'start': 76.935, 'duration': 9.006}, {'end': 89.144, 'text': 'Take just the example of prompt management.', 'start': 86.322, 'duration': 2.822}, {'end': 91.811, 'text': 'When designing prompts.', 'start': 90.051, 'duration': 1.76}], 'summary': 'Lmops automates data prep, model tuning, deployment, and monitoring for large language models.', 'duration': 22.601, 'max_score': 69.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ69210.jpg'}], 'start': 1.96, 'title': 'Introduction to llm ops', 'summary': 'Explains the challenges of developing llm-based applications, emphasizing the need for automated processes in designing, deploying, and maintaining large language model applications, and the extension of mlops to lmops.', 'chapters': [{'end': 112.296, 'start': 1.96, 'title': 'Introduction to llm ops', 'summary': 'Explains the challenges of developing llm-based applications, emphasizing the need for automated processes in designing, deploying, and maintaining large language model applications, and the extension of mlops to lmops.', 'duration': 110.336, 'highlights': ['Automated processes for designing and deploying LLM-based applications are crucial to streamline workflow and enhance productivity, as demonstrated by the example of summarizing customer emails and handling prompt management.', 'The need for tools to automate the cycle of data preparation, model tuning, deployment, maintenance, and monitoring in LLM Ops is highlighted, emphasizing its extension from MLOps to applications based on large language models.', 'The importance of having automated ways to deploy and monitor applications, especially in scenarios where multiple LLM calls are involved, is emphasized to optimize performance and simplify workflow.']}], 'duration': 110.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1960.jpg', 'highlights': ['Automated processes for designing and deploying LLM-based applications are crucial to streamline workflow and enhance productivity, as demonstrated by the example of summarizing customer emails and handling prompt management.', 'The need for tools to automate the cycle of data preparation, model tuning, deployment, maintenance, and monitoring in LLM Ops is highlighted, emphasizing its extension from MLOps to applications based on large language models.', 'The importance of having automated ways to deploy and monitor applications, especially in scenarios where multiple LLM calls are involved, is emphasized to optimize performance and simplify workflow.']}, {'end': 494.227, 'segs': [{'end': 186.809, 'src': 'embed', 'start': 161.717, 'weight': 0, 'content': [{'end': 167.56, 'text': 'You also learn how to automate and orchestrate the LM tuning workflow for multiple use cases,', 'start': 161.717, 'duration': 5.843}, {'end': 172.382, 'text': 'including when you have huge text datasets that might be too large to fit in memory.', 'start': 167.56, 'duration': 4.822}, {'end': 177.504, 'text': 'You learn how to deploy your model as a REST API and call the API.', 'start': 172.622, 'duration': 4.882}, {'end': 180.466, 'text': "If you don't know what a REST API is, don't worry about it.", 'start': 177.825, 'duration': 2.641}, {'end': 182.067, 'text': 'Erwin will cover that too.', 'start': 180.906, 'duration': 1.161}, {'end': 186.809, 'text': 'You also learn how to build an overall LM Ops workflow.', 'start': 182.767, 'duration': 4.042}], 'summary': 'Learn to automate lm tuning workflow for large text datasets and deploy model as rest api.', 'duration': 25.092, 'max_score': 161.717, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ161717.jpg'}, {'end': 236.427, 'src': 'embed', 'start': 212.663, 'weight': 3, 'content': [{'end': 219.811, 'text': 'Effective management and operational strategies become really important if you want to ship your code to real users and do so efficiently.', 'start': 212.663, 'duration': 7.148}, {'end': 227.643, 'text': 'Building AI applications has become easier thanks to developments like foundation models as APIs and open source LLMs.', 'start': 220.939, 'duration': 6.704}, {'end': 236.427, 'text': "Now it's possible to build many different use cases quickly, so rather than carefully planning out something, you might even build multiple use cases.", 'start': 228.383, 'duration': 8.044}], 'summary': 'Efficiently ship code to real users with ai applications using foundation models and open source llms.', 'duration': 23.764, 'max_score': 212.663, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ212663.jpg'}, {'end': 341.535, 'src': 'embed', 'start': 314.918, 'weight': 1, 'content': [{'end': 326.187, 'text': 'ML. Ops is an ML engineering culture and practice that aims at unifying ML development and ML operations.', 'start': 314.918, 'duration': 11.269}, {'end': 331.27, 'text': 'Automation and monitoring at all steps of the ML system is crucial.', 'start': 326.187, 'duration': 5.083}, {'end': 333.811, 'text': 'With automating meaning.', 'start': 331.71, 'duration': 2.101}, {'end': 341.535, 'text': 'maybe, if you have a use case where you want to deploy a large language model, you want to automate the process of data engineering,', 'start': 333.811, 'duration': 7.724}], 'summary': 'Ml ops unifies ml development and operations, emphasizes automation and monitoring for all ml system steps.', 'duration': 26.617, 'max_score': 314.918, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ314918.jpg'}], 'start': 112.296, 'title': 'Efficient ai production management', 'summary': 'Covers best practices for managing complexity in large language model-based applications, emphasizing unification of ml development and operations, with a focus on automation, orchestrating workflows, and surge of creativity driven by generative ai.', 'chapters': [{'end': 248.114, 'start': 112.296, 'title': 'Managing complexity in large language model ops', 'summary': 'Covers the latest best practices for managing complexity in large language model-based applications, including automating and orchestrating workflows, deploying models as rest apis, and utilizing sql for data preparation, with a focus on the surge of creativity and innovation in the developer community driven by generative ai and large language models.', 'duration': 135.818, 'highlights': ['The chapter covers the latest best practices for managing complexity in large language model-based applications, including automating and orchestrating workflows, deploying models as REST APIs, and utilizing SQL for data preparation. (Relevance: 5)', 'Erwin Hausener, a developer advocate in machine learning at Google Cloud, will instruct on building an end-to-end workflow for large language model-based applications, including preparing data for tuning, automating and orchestrating workflows, and deploying models as REST APIs. (Relevance: 4)', 'The course includes learning how to automate and orchestrate the LM tuning workflow for multiple use cases, including handling huge text datasets that may be too large to fit in memory. (Relevance: 3)', 'Developments like foundation models as APIs and open source LLMs have made building AI applications easier, leading to a surge in creativity and innovation within the developer community. (Relevance: 2)', 'Effective management and operational strategies become really important when shipping code to real users efficiently in the context of the surge of creativity and innovation within the developer community driven by generative AI and large language models. (Relevance: 1)']}, {'end': 494.227, 'start': 248.974, 'title': 'Efficient management of ai production use cases', 'summary': 'Introduces the concepts and ideas of llm ops, emphasizing the unification of ml development and operations, automation, and orchestration to efficiently manage ai production use cases, with a focus on machine learning operations workflow and mlops framework.', 'duration': 245.253, 'highlights': ['LLM Ops involves unifying ML development and operations, automating and monitoring all steps of the ML system, and managing processes such as data engineering, training or tuning models, and deploying them in production.', 'The MLOps framework for bringing a large language model into production includes steps like data ingestion, data validation, model training/tuning, model analysis, model serving, and logging metrics related to the model in production.', 'Automation and orchestration are essential for efficiency in managing the machine learning operations workflow, reducing manual execution time and ensuring systematic step-by-step execution of processes.', 'Acknowledgment of contributors from Google Cloud and the Deep Learning with AI team, including Nikita Namjishi, Dave Elliott, and Eddie Hsu, highlights the collaborative effort in creating the course, enhancing its credibility and expertise.']}], 'duration': 381.931, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ112296.jpg', 'highlights': ['The chapter covers best practices for managing complexity in large language model-based applications, emphasizing automation and orchestration of workflows. (Relevance: 5)', 'LLM Ops involves unifying ML development and operations, automating and monitoring all steps of the ML system, and managing processes such as data engineering, training or tuning models, and deploying them in production. (Relevance: 4)', 'The course includes learning how to automate and orchestrate the LM tuning workflow for multiple use cases, including handling huge text datasets that may be too large to fit in memory. (Relevance: 3)', 'Developments like foundation models as APIs and open source LLMs have made building AI applications easier, leading to a surge in creativity and innovation within the developer community. (Relevance: 2)', 'Effective management and operational strategies become really important when shipping code to real users efficiently in the context of the surge of creativity and innovation within the developer community driven by generative AI and large language models. (Relevance: 1)']}, {'end': 967.538, 'segs': [{'end': 631.043, 'src': 'embed', 'start': 544.217, 'weight': 1, 'content': [{'end': 551.083, 'text': "Let's go through a few examples of what are these differences between MLOps for LLMs and LLM system design.", 'start': 544.217, 'duration': 6.866}, {'end': 557.188, 'text': 'When dealing with MLOps for LLMs, you might have to think through how do you do experimentation.', 'start': 551.363, 'duration': 5.825}, {'end': 559.29, 'text': "There's many great models out there.", 'start': 557.388, 'duration': 1.902}, {'end': 564.775, 'text': "There's many great foundation models out there like Palm or Llama.", 'start': 559.61, 'duration': 5.165}, {'end': 571.088, 'text': 'You might want to experiment with multiple foundation models to understand which one is a good fit for your use case.', 'start': 565.126, 'duration': 5.962}, {'end': 572.589, 'text': "Let's say summarization.", 'start': 571.509, 'duration': 1.08}, {'end': 574.73, 'text': 'Then you might design multiple prompts.', 'start': 572.749, 'duration': 1.981}, {'end': 582.133, 'text': 'And then you have to think through how do you manage these prompts during experimentation, but also when you use prompts in production.', 'start': 575.49, 'duration': 6.643}, {'end': 584.479, 'text': 'you might want to use supervised tuning.', 'start': 582.577, 'duration': 1.902}, {'end': 588.983, 'text': "So let's say you want to tune your own model for your summarization use case.", 'start': 584.999, 'duration': 3.984}, {'end': 597.511, 'text': 'Maybe the Palm API works out of the box for you, but you want to improve it by using supervised fine tuning on summaries.', 'start': 589.363, 'duration': 8.148}, {'end': 604.04, 'text': 'Of course, we also need to think through monitoring our LLMs in production and how do we evaluate LLMs.', 'start': 597.995, 'duration': 6.045}, {'end': 608.364, 'text': "Evaluating LLMs is still a topic where there's a lot of research.", 'start': 604.561, 'duration': 3.803}, {'end': 612.208, 'text': 'So the way you evaluate your LLM might change over time.', 'start': 608.905, 'duration': 3.303}, {'end': 616.692, 'text': "When we look at LLM system design, there's other things that we have to think about.", 'start': 612.508, 'duration': 4.184}, {'end': 622.137, 'text': "Like, if we're building a summarization use case, how are we going to chain multiple steps together?", 'start': 616.932, 'duration': 5.205}, {'end': 625.479, 'text': "So let's say we have a lot of documents.", 'start': 622.637, 'duration': 2.842}, {'end': 626.3, 'text': 'we want to summarize.', 'start': 625.479, 'duration': 0.821}, {'end': 628.721, 'text': 'Too much for the LLM to process at once.', 'start': 626.66, 'duration': 2.061}, {'end': 631.043, 'text': 'So we have to summarize in batches.', 'start': 629.062, 'duration': 1.981}], 'summary': 'Mlops for llms focuses on experimentation, model tuning, and monitoring, while llm system design involves chaining multiple steps for tasks like summarization.', 'duration': 86.826, 'max_score': 544.217, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ544217.jpg'}, {'end': 798.941, 'src': 'embed', 'start': 694.908, 'weight': 7, 'content': [{'end': 700.692, 'text': 'You might want to use a foundation model out of the box, like the Palm API or a Lama model.', 'start': 694.908, 'duration': 5.784}, {'end': 706.376, 'text': 'When you get the response of your LLM, you might want to do some grounding again.', 'start': 701.352, 'duration': 5.024}, {'end': 709.596, 'text': 'So check the response against the facts that you have.', 'start': 706.894, 'duration': 2.702}, {'end': 713.959, 'text': 'Once we have done our grounding, you might want to do some post-processing.', 'start': 710.176, 'duration': 3.783}, {'end': 719.062, 'text': 'You maybe want to clean up the response and give it a structure that is user-friendly.', 'start': 714.199, 'duration': 4.863}, {'end': 723.145, 'text': 'You might also want to include and think about responsible AI.', 'start': 719.422, 'duration': 3.723}, {'end': 725.987, 'text': 'We want to make sure that we build AI responsibly.', 'start': 723.365, 'duration': 2.622}, {'end': 731.07, 'text': 'Maybe you want to check for toxicity or any bias in the response of the LLM.', 'start': 726.287, 'duration': 4.783}, {'end': 734.312, 'text': "Anything that's important to you and your use case.", 'start': 731.43, 'duration': 2.882}, {'end': 741.236, 'text': "Once we have output we're happy with, This is what goes back to the user and the user sees the final output.", 'start': 734.773, 'duration': 6.463}, {'end': 744.821, 'text': 'Maybe you want to tune your own custom model.', 'start': 742.099, 'duration': 2.722}, {'end': 747.523, 'text': 'So you want to do model customization.', 'start': 745.441, 'duration': 2.082}, {'end': 754.648, 'text': 'When we do model customization, you have to go through the process of data preparation, tuning a model and, of course,', 'start': 747.843, 'duration': 6.805}, {'end': 759.431, 'text': 'evaluating and understanding how your tuned model is performing on your use case.', 'start': 754.648, 'duration': 4.783}, {'end': 761.813, 'text': 'And of course, this is an iterative process.', 'start': 759.772, 'duration': 2.041}, {'end': 767.417, 'text': "You might want to do this a couple of times until you have a summarization model that you're happy with.", 'start': 762.133, 'duration': 5.284}, {'end': 770.649, 'text': "Once you have a model you're happy with,", 'start': 768.268, 'duration': 2.381}, {'end': 777.892, 'text': 'you can deploy that model into your production environment and you have a fine-tuned model you can use in your LLM-driven application.', 'start': 770.649, 'duration': 7.243}, {'end': 779.693, 'text': 'This is an example workflow.', 'start': 778.192, 'duration': 1.501}, {'end': 784.475, 'text': 'And, of course, depending on your use case, maybe you have something else in summarization.', 'start': 780.093, 'duration': 4.382}, {'end': 790.017, 'text': 'depending on your requirements and your use case, you might have a different use case than summarization.', 'start': 784.475, 'duration': 5.542}, {'end': 791.998, 'text': 'you can take a different approach.', 'start': 790.017, 'duration': 1.981}, {'end': 795.319, 'text': 'So this can look totally different for your use case.', 'start': 792.638, 'duration': 2.681}, {'end': 798.941, 'text': "In this course, we'll focus on the ones that are in green.", 'start': 795.699, 'duration': 3.242}], 'summary': 'Workflow involves model customization, responsible ai, and iterative refining for user-friendly output.', 'duration': 104.033, 'max_score': 694.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ694908.jpg'}, {'end': 848.701, 'src': 'embed', 'start': 824.032, 'weight': 4, 'content': [{'end': 831.355, 'text': "We're going to start with preparing our data sets and we're going to version our data sets so that we can keep track of the data sets that we created.", 'start': 824.032, 'duration': 7.323}, {'end': 838.813, 'text': "Next, we're going to design a pipeline that's going to do supervised tuning of a large language model for us.", 'start': 831.696, 'duration': 7.117}, {'end': 841.035, 'text': 'And this might look like a simple box.', 'start': 839.173, 'duration': 1.862}, {'end': 845.739, 'text': 'A lot of great things happen in this pipeline, and that is automated for us.', 'start': 841.735, 'duration': 4.004}, {'end': 848.701, 'text': "Next, we're going to generate an artifact.", 'start': 846.559, 'duration': 2.142}], 'summary': 'Preparing and versioning datasets, designing a supervised tuning pipeline, and generating an artifact.', 'duration': 24.669, 'max_score': 824.032, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ824032.jpg'}, {'end': 946.26, 'src': 'embed', 'start': 896.454, 'weight': 0, 'content': [{'end': 903.966, 'text': 'Once we have the response of the LLM, we can use responsible AI to check the safety using safety scores.', 'start': 896.454, 'duration': 7.512}, {'end': 909.03, 'text': 'Two essential topics for LLM ops are orchestration.', 'start': 904.566, 'duration': 4.464}, {'end': 913.934, 'text': "So we're going to talk about how you can orchestrate a pipeline,", 'start': 909.691, 'duration': 4.243}, {'end': 921.68, 'text': 'meaning maybe you want to do data preparation and prepare a data set first before you do supervised tuning, before you deploy a model.', 'start': 913.934, 'duration': 7.746}, {'end': 925.563, 'text': "So you're going to orchestrate the steps in your pipeline.", 'start': 921.98, 'duration': 3.583}, {'end': 927.405, 'text': 'Now, once we have that pipeline.', 'start': 925.823, 'duration': 1.582}, {'end': 937.099, 'text': 'Secondly, we want to talk about automation and how we can automate our pipeline to make our life easier as developers.', 'start': 928.297, 'duration': 8.802}, {'end': 941.059, 'text': 'This LLMOps pipeline is a simplified diagram.', 'start': 937.459, 'duration': 3.6}, {'end': 946.26, 'text': 'Depending on your use case, on your requirements, this can differ.', 'start': 941.82, 'duration': 4.44}], 'summary': 'Using responsible ai to check safety using safety scores and orchestrating an llmops pipeline for automation.', 'duration': 49.806, 'max_score': 896.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ896454.jpg'}], 'start': 494.227, 'title': 'Mlops and workflow for large language models', 'summary': 'Discusses the differences between mlops for large language models (llms) and llm system design, focusing on experimentation, supervised tuning, monitoring, and evaluation for llms, as well as the workflow for tuning a language model, including pre-processing, grounding, post-processing, responsible ai considerations, model customization, and deployment.', 'chapters': [{'end': 675.571, 'start': 494.227, 'title': 'Mlops for large language models', 'summary': 'Discusses the differences between mlops for large language models (llms) and llm system design, focusing on the narrow focus of llmops and the broader application considerations of llm system design, with examples of experimentation, supervised tuning, monitoring, and evaluation for llms, as well as the steps involved in llm system design.', 'duration': 181.344, 'highlights': ['The differences between MLOps for Large Language Models (LLMs) and LLM system design, including the narrow focus of LLMOps and the broader application considerations of LLM system design, with examples of experimentation, supervised tuning, monitoring, and evaluation for LLMs, as well as the steps involved in LLM system design.', 'The need to experiment with multiple foundation models and prompts for LLMs to find a good fit for a specific use case, such as summarization.', 'The consideration of supervised fine tuning on summaries to improve the performance of the model for a specific use case.', 'The importance of monitoring and evaluating LLMs in production, as well as the evolving nature of LLM evaluation over time.', 'The various considerations involved in LLM system design, such as chaining multiple steps together for processing large amounts of data, grounding for additional information, and tracking the history of created summaries.', 'The high-level example of a LLM-driven application, outlining the user interface, backend processing, and pre-processing steps for tasks such as summarization.']}, {'end': 798.941, 'start': 675.831, 'title': 'Workflow for llm model tuning', 'summary': 'Outlines the workflow for tuning a language model, including pre-processing, grounding, post-processing, responsible ai considerations, model customization, and deployment, emphasizing an iterative process and adaptability to different use cases.', 'duration': 123.11, 'highlights': ['The chapter covers the workflow for tuning a language model, including pre-processing, grounding, post-processing, responsible AI considerations, model customization, and deployment.', 'The process involves iterative steps such as data preparation, model tuning, and evaluation to achieve a satisfactory summarization model.', "Emphasis is placed on responsible AI, including checking for toxicity and bias in the model's responses, aligning with the goal of building AI responsibly.", 'The chapter suggests the usage of foundation models like the Palm API or Lama model for LLM, and the need to ensure user-friendly response structures through post-processing.', 'The workflow also highlights the adaptability to different use cases, with the recognition that the approach may vary based on specific requirements.', 'The chapter also emphasizes the iterative nature of the process, indicating that the model tuning may need to be repeated several times until a satisfactory model is achieved.']}, {'end': 967.538, 'start': 799.441, 'title': 'Llmops pipeline workflow', 'summary': 'Discusses the llmops pipeline workflow, including data preparation, supervised tuning, model deployment, prompt usage, and safety checks, aiming to automate and orchestrate the pipeline to deploy and get predictions from the large language model (llm).', 'duration': 168.097, 'highlights': ['The LLMOps pipeline involves data preparation, versioning, supervised tuning, artifact generation, model deployment, prompt usage, and safety evaluation.', 'Orchestration and automation are essential for the LLMOps pipeline to streamline the workflow and deployment process.', 'The LLMOps pipeline provides a simplified diagram for operationalizing the Large Language Model (LLM) but can be customized based on specific use cases and requirements.']}], 'duration': 473.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ494227.jpg', 'highlights': ['The LLMOps pipeline involves data preparation, versioning, supervised tuning, artifact generation, model deployment, prompt usage, and safety evaluation.', 'The differences between MLOps for Large Language Models (LLMs) and LLM system design, including the narrow focus of LLMOps and the broader application considerations of LLM system design, with examples of experimentation, supervised tuning, monitoring, and evaluation for LLMs, as well as the steps involved in LLM system design.', 'The need to experiment with multiple foundation models and prompts for LLMs to find a good fit for a specific use case, such as summarization.', 'The consideration of supervised fine tuning on summaries to improve the performance of the model for a specific use case.', 'The chapter covers the workflow for tuning a language model, including pre-processing, grounding, post-processing, responsible AI considerations, model customization, and deployment.', 'The importance of monitoring and evaluating LLMs in production, as well as the evolving nature of LLM evaluation over time.', 'The various considerations involved in LLM system design, such as chaining multiple steps together for processing large amounts of data, grounding for additional information, and tracking the history of created summaries.', 'The process involves iterative steps such as data preparation, model tuning, and evaluation to achieve a satisfactory summarization model.', "Emphasis is placed on responsible AI, including checking for toxicity and bias in the model's responses, aligning with the goal of building AI responsibly.", 'The chapter suggests the usage of foundation models like the Palm API or Lama model for LLM, and the need to ensure user-friendly response structures through post-processing.', 'Orchestration and automation are essential for the LLMOps pipeline to streamline the workflow and deployment process.', 'The workflow also highlights the adaptability to different use cases, with the recognition that the approach may vary based on specific requirements.', 'The high-level example of a LLM-driven application, outlining the user interface, backend processing, and pre-processing steps for tasks such as summarization.', 'The chapter also emphasizes the iterative nature of the process, indicating that the model tuning may need to be repeated several times until a satisfactory model is achieved.', 'The LLMOps pipeline provides a simplified diagram for operationalizing the Large Language Model (LLM) but can be customized based on specific use cases and requirements.']}, {'end': 1892.427, 'segs': [{'end': 1027.02, 'src': 'embed', 'start': 997.649, 'weight': 0, 'content': [{'end': 1002.251, 'text': 'One of the key steps of LLM Ops is dealing with text data and lots of it.', 'start': 997.649, 'duration': 4.602}, {'end': 1010.516, 'text': "In this lesson, you'll learn how to retrieve Stack Overflow text data from a data warehouse, dealing with data that's too large to fit in memory,", 'start': 1003.152, 'duration': 7.364}, {'end': 1014.538, 'text': 'using SQL and modify the data to tune a model to be more task specific.', 'start': 1010.516, 'duration': 4.022}, {'end': 1018.813, 'text': 'In order to run this lab, we have to go through some setup code.', 'start': 1015.43, 'duration': 3.383}, {'end': 1020.875, 'text': 'We have to set up authentication.', 'start': 1019.073, 'duration': 1.802}, {'end': 1027.02, 'text': 'We have to say, this is who I am, and I have permissions to access this service and the data in the cloud.', 'start': 1021.195, 'duration': 5.825}], 'summary': 'Llm ops involves dealing with large text data from stack overflow stored in a data warehouse, requiring sql for data retrieval and modification to tune a model.', 'duration': 29.371, 'max_score': 997.649, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ997649.jpg'}, {'end': 1178.091, 'src': 'embed', 'start': 1135.817, 'weight': 1, 'content': [{'end': 1138.539, 'text': 'You might have used Pandas, so SQL versus Pandas.', 'start': 1135.817, 'duration': 2.722}, {'end': 1143.822, 'text': 'So Pandas is great if the data fits in memory on your computer,', 'start': 1139.079, 'duration': 4.743}, {'end': 1148.365, 'text': 'and SQL is great if your data sits in a data warehouse and you might want to process at scale.', 'start': 1143.822, 'duration': 4.543}, {'end': 1155.771, 'text': 'SQL Plus Data Warehouse is a powerful tool as we discussed when building ML Ops systems for large language models.', 'start': 1149.205, 'duration': 6.566}, {'end': 1157.773, 'text': "We want to make sure it's scalable.", 'start': 1156.212, 'duration': 1.561}, {'end': 1159.355, 'text': "Okay, let's get started.", 'start': 1158.394, 'duration': 0.961}, {'end': 1163.959, 'text': 'In order to interact with our data warehouse, we again have to initialize a library.', 'start': 1159.655, 'duration': 4.304}, {'end': 1170.845, 'text': "We're going to use the BigQuery client, and we're going to initialize it in the same way as with the Vertex AI client.", 'start': 1164.299, 'duration': 6.546}, {'end': 1178.091, 'text': "So we're going to import the BigQuery library, And then we're going to initialize the project and make sure we're using our credentials.", 'start': 1170.865, 'duration': 7.226}], 'summary': 'Pandas for in-memory data, sql for data warehouse processing at scale. using bigquery for scalable data interactions.', 'duration': 42.274, 'max_score': 1135.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1135817.jpg'}, {'end': 1621.728, 'src': 'embed', 'start': 1587.255, 'weight': 2, 'content': [{'end': 1593.798, 'text': 'In the following example, you will combine two tables to get a question and the answer that we need for.', 'start': 1587.255, 'duration': 6.543}, {'end': 1601.841, 'text': "We're going to use a where clause that allows us to filter the results based on a specific condition,", 'start': 1594.938, 'duration': 6.903}, {'end': 1605.182, 'text': 'ensuring that only the relevant data that we need is returned.', 'start': 1601.841, 'duration': 3.341}, {'end': 1610.624, 'text': 'This can significantly improve performance, especially when dealing with these large datasets.', 'start': 1605.502, 'duration': 5.122}, {'end': 1621.728, 'text': "Okay Let's get our query and I'll talk you through it in more detail so you understand what we're actually returning from the two tables.", 'start': 1611.184, 'duration': 10.544}], 'summary': 'Combining tables using a where clause for improved performance with large datasets.', 'duration': 34.473, 'max_score': 1587.255, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1587255.jpg'}, {'end': 1803.821, 'src': 'embed', 'start': 1773.868, 'weight': 4, 'content': [{'end': 1780.513, 'text': 'Fine-tuning language models on a collection of datasets, phrased as instructions has been shown,', 'start': 1773.868, 'duration': 6.645}, {'end': 1785.998, 'text': 'to improve model performance and generalization to unseen tasks.', 'start': 1780.513, 'duration': 5.485}, {'end': 1794.64, 'text': 'An instruction refers to a specific direction or guideline that conveys a task or action to be executed.', 'start': 1786.838, 'duration': 7.802}, {'end': 1800.581, 'text': "Basically, you're telling the large language model what to do.", 'start': 1795.42, 'duration': 5.161}, {'end': 1803.821, 'text': 'These instructions can be expressed in various forms.', 'start': 1800.921, 'duration': 2.9}], 'summary': 'Fine-tuning language models on diverse datasets improves performance and generalization for unseen tasks.', 'duration': 29.953, 'max_score': 1773.868, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1773868.jpg'}], 'start': 968.019, 'title': 'Data preparation and query optimization for llm ops', 'summary': 'Covers the importance of data preparation, setting up authentication, and optimizing data queries for llm ops, emphasizing dealing with large text data, accessing and visualizing large datasets, and providing instructions for language models, resulting in performance improvement and generalization.', 'chapters': [{'end': 1020.875, 'start': 968.019, 'title': 'Model evaluation and data preparation for llm ops', 'summary': 'Introduces model evaluation briefly and emphasizes the importance of data preparation for llm ops, focusing on dealing with large text data and setting up authentication.', 'duration': 52.856, 'highlights': ['Dealing with text data and lots of it is one of the key steps of LLM Ops, including retrieving Stack Overflow text data from a data warehouse, using SQL to handle large data, and modifying data to tune a model (quantifiable data: dealing with large text data).', 'The chapter briefly discusses model evaluation and emphasizes the importance of data preparation for LLM Ops, setting up authentication being a crucial step for running the lab (quantifiable data: setting up authentication).']}, {'end': 1568.349, 'start': 1021.195, 'title': 'Setting up data warehouse and accessing large data sets', 'summary': 'Covers setting up authentication, initializing vertex ai and bigquery libraries, and accessing and visualizing data from a data warehouse, emphasizing the importance of sql for efficient processing and the challenges of dealing with large datasets.', 'duration': 547.154, 'highlights': ['The importance of setting up authentication and initializing the Vertex AI and BigQuery libraries for interacting with cloud services and data.', 'Explanation of the role of a data warehouse as a central repository for storing and analyzing large amounts of data from various sources, with a focus on using BigQuery as the data warehouse.', 'Using SQL for efficient processing and data preparation, highlighting its advantages over Pandas for processing at scale and its significance in building ML Ops systems for large language models.', 'Challenges and best practices for dealing with large datasets, including the limitations of local memory and the recommendation to perform processing through SQL in the data warehouse instead of exporting all the data and considerations for storing and accessing training data for efficient training at scale.']}, {'end': 1892.427, 'start': 1568.349, 'title': 'Optimizing data queries and adding instructions', 'summary': 'Explains how to optimize data queries for large datasets by combining tables, using where clauses, and setting limits, resulting in a performance improvement. it also details the importance of providing instructions to language models for better model performance and generalization, including specific examples of instruction templates.', 'duration': 324.078, 'highlights': ['The importance of optimizing data queries for large datasets is emphasized, with the explanation of combining tables, using where clauses, and setting limits for improved performance.', 'Providing instructions to language models is crucial for better model performance and generalization, with specific examples of instruction templates and their impact on model understanding and task execution.', 'Explaining the process of combining instruction templates with question input text to create a new column for enhanced dataset utilization and model guidance.']}], 'duration': 924.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ968019.jpg', 'highlights': ['Dealing with text data and lots of it is crucial for LLM Ops, including retrieving Stack Overflow text data from a data warehouse, using SQL to handle large data, and modifying data to tune a model (dealing with large text data).', 'The importance of setting up authentication and initializing the Vertex AI and BigQuery libraries for interacting with cloud services and data.', 'The importance of optimizing data queries for large datasets is emphasized, with the explanation of combining tables, using where clauses, and setting limits for improved performance.', 'Using SQL for efficient processing and data preparation, highlighting its advantages over Pandas for processing at scale and its significance in building ML Ops systems for large language models.', 'Providing instructions to language models is crucial for better model performance and generalization, with specific examples of instruction templates and their impact on model understanding and task execution.']}, {'end': 2289.477, 'segs': [{'end': 1962.699, 'src': 'embed', 'start': 1892.967, 'weight': 0, 'content': [{'end': 1904.597, 'text': 'Next, you will divide the data into a training and evaluation set, where evaluation will be used as unseen data during tuning to evaluate performance.', 'start': 1892.967, 'duration': 11.63}, {'end': 1914.891, 'text': "We're going to use scikit-learn train test split in order to split the pandas data frame into a train and evaluation set.", 'start': 1905.097, 'duration': 9.794}, {'end': 1921.26, 'text': 'The data is divided into training and evaluation with 80-20 split by default.', 'start': 1916.857, 'duration': 4.403}, {'end': 1924.041, 'text': 'We want to have a bit more data for our tuning.', 'start': 1921.56, 'duration': 2.481}, {'end': 1927.623, 'text': 'Play a bit with this and you can change it to whatever you like.', 'start': 1924.221, 'duration': 3.402}, {'end': 1934.447, 'text': 'We also going to use a random state parameter to initialize the random number generator.', 'start': 1927.923, 'duration': 6.524}, {'end': 1940.01, 'text': 'We want to use random sampling to make sure that we do a fair comparison of our model.', 'start': 1934.927, 'duration': 5.083}, {'end': 1941.931, 'text': 'Let me make an important point.', 'start': 1940.591, 'duration': 1.34}, {'end': 1952.169, 'text': "So keep your parameters as consistent as possible across your experiments so that you're able to do a fair comparison.", 'start': 1943.012, 'duration': 9.157}, {'end': 1960.837, 'text': 'With experiment meaning when you train your model or when you run your end-to-end workflow from your data to your model tuning.', 'start': 1953.05, 'duration': 7.787}, {'end': 1962.699, 'text': 'If you change too many parameters.', 'start': 1961.137, 'duration': 1.562}], 'summary': 'Data is divided into 80-20 training and evaluation split by default, with the option to adjust for tuning purposes and to ensure fair model comparison.', 'duration': 69.732, 'max_score': 1892.967, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1892967.jpg'}, {'end': 2062.976, 'src': 'embed', 'start': 2035.117, 'weight': 1, 'content': [{'end': 2039.12, 'text': "Well, there's many options, and I want to talk about a few of the key ones.", 'start': 2035.117, 'duration': 4.003}, {'end': 2043.964, 'text': "So this example, we're going to use a JSON line format, JSON-L format.", 'start': 2039.641, 'duration': 4.323}, {'end': 2050.308, 'text': "It's a simple text-based format where each question and answer will be a row.", 'start': 2044.224, 'duration': 6.084}, {'end': 2056.212, 'text': "It's very human readable and it's an ideal choice for a small to medium-sized data set.", 'start': 2050.728, 'duration': 5.484}, {'end': 2062.976, 'text': 'If you have larger data sets, you can use a binary file format like a t of record or a parquet file.', 'start': 2056.532, 'duration': 6.444}], 'summary': 'The json-l format is ideal for small to medium-sized data sets, while binary file formats like t of record or parquet are suitable for larger data sets.', 'duration': 27.859, 'max_score': 2035.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2035117.jpg'}, {'end': 2133.624, 'src': 'embed', 'start': 2104.73, 'weight': 2, 'content': [{'end': 2111.151, 'text': 'It will be important that you keep track of the different artifacts and that you also do versioning of your data.', 'start': 2104.73, 'duration': 6.421}, {'end': 2119.594, 'text': 'So one example you might want to know from which data, set from your data warehouse, you generated your data file so that you have,', 'start': 2111.371, 'duration': 8.223}, {'end': 2126.138, 'text': "but it's also important for having reproducibility and maintainability.", 'start': 2120.574, 'duration': 5.564}, {'end': 2127.119, 'text': 'And, of course,', 'start': 2126.459, 'duration': 0.66}, {'end': 2133.624, 'text': "you want to make sure your colleagues can also understand what's going on if they have to take over some of the work or have to help you.", 'start': 2127.119, 'duration': 6.505}], 'summary': 'Track artifacts and do versioning for reproducibility and maintainability.', 'duration': 28.894, 'max_score': 2104.73, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2104730.jpg'}], 'start': 1892.967, 'title': 'Data splitting, model tuning, format, and versioning', 'summary': 'Covers data splitting using an 80-20 split, random sampling for fair model comparison, consistent parameters for model tuning, the significance of data format and versioning in mlops workflow, recommendation of json line format and binary file formats for efficient training, and emphasizing versioning for reproducibility and maintainability.', 'chapters': [{'end': 1991.848, 'start': 1892.967, 'title': 'Data splitting and model tuning', 'summary': 'Covers the process of dividing data into training and evaluation sets using an 80-20 split, using random sampling for fair model comparison, and the importance of keeping parameters consistent for a fair comparison when tuning models.', 'duration': 98.881, 'highlights': ["The data is divided into training and evaluation with 80-20 split by default. We're going to use scikit-learn train test split in order to split the pandas data frame into a train and evaluation set. (Quantifiable: 80-20 split)", 'We want to use random sampling to make sure that we do a fair comparison of our model. (Key point: Fair model comparison)', 'With experiment meaning when you train your model or when you run your end-to-end workflow from your data to your model tuning. (Key point: Importance of consistent parameters for fair comparison)']}, {'end': 2289.477, 'start': 1992.188, 'title': 'Data format and versioning', 'summary': 'Discusses the importance of data format and versioning in mlops workflow, recommending json line format and binary file formats like tf record and parquet for efficient training, emphasizing the significance of versioning for reproducibility and maintainability, and detailing the process of generating and tracking the jsonl file.', 'duration': 297.289, 'highlights': ['The importance of versioning for reproducibility and maintainability is emphasized, suggesting using timestamps and dataset names for tracking, to ensure that the generated files can be traced back to the original data set and time of generation.', 'Recommendations for data format include JSON line format for small to medium-sized data sets due to its human readability, and binary file formats like TF record and parquet for efficient training, with TF record being ideal for computers and parquet files being efficient for large and complex data sets.', 'The chapter advises storing data from a data warehouse in files on SSD or cloud storage to facilitate efficient reading during training or tuning, with best practices suggesting using SSD environment or cloud storage bucket for large files or multiple generated files.', 'It is mentioned that accuracy is not calculated for text use case due to the ambiguity of text, which makes it challenging to measure accuracy over text data.']}], 'duration': 396.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ1892967.jpg', 'highlights': ['The data is divided into training and evaluation with 80-20 split by default. (Quantifiable: 80-20 split)', 'Recommendations for data format include JSON line format for small to medium-sized data sets. (Key point: Data format recommendation)', 'The importance of versioning for reproducibility and maintainability is emphasized. (Key point: Versioning importance)', 'We want to use random sampling to make sure that we do a fair comparison of our model. (Key point: Fair model comparison)', 'With experiment meaning when you train your model or when you run your end-to-end workflow. (Key point: Consistent parameters importance)']}, {'end': 3550.767, 'segs': [{'end': 2361.242, 'src': 'embed', 'start': 2333.5, 'weight': 2, 'content': [{'end': 2339.807, 'text': "I've talked about system design for LLMs, and I've also talked about these workflows that are more one dimensional.", 'start': 2333.5, 'duration': 6.307}, {'end': 2342.75, 'text': "Let's say you want to train or tune a model.", 'start': 2340.427, 'duration': 2.323}, {'end': 2347.935, 'text': 'The process you typically go through is you gather some training data, You do training,', 'start': 2342.99, 'duration': 4.945}, {'end': 2354.558, 'text': 'you do evaluation and you do evaluation during the training or tuning process, as we talked about in the previous lab.', 'start': 2347.935, 'duration': 6.623}, {'end': 2361.242, 'text': 'And then as a result, you will have a trained model, which could be something like a TensorFlow safe model format.', 'start': 2354.738, 'duration': 6.504}], 'summary': 'Discussed system design and one-dimensional workflows for model training and tuning, involving data gathering, training, and evaluation.', 'duration': 27.742, 'max_score': 2333.5, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2333500.jpg'}, {'end': 2429.455, 'src': 'embed', 'start': 2397.535, 'weight': 3, 'content': [{'end': 2400.336, 'text': "There's orchestration, automation, and deployment.", 'start': 2397.535, 'duration': 2.801}, {'end': 2408.921, 'text': 'Orchestration meaning is that you specify which step needs to be run first, and then what is the next step, so on, so forth.', 'start': 2400.576, 'duration': 8.345}, {'end': 2412.743, 'text': 'Automation meaning is that you automate this workflow.', 'start': 2409.441, 'duration': 3.302}, {'end': 2418.707, 'text': 'This helps you, for example, that if you want to train a new model, that you can rerun the end-to-end workflow again.', 'start': 2412.983, 'duration': 5.724}, {'end': 2424.431, 'text': "And deployment I talked about, this means it's taking your trained model and putting it into your production environment.", 'start': 2418.967, 'duration': 5.464}, {'end': 2429.455, 'text': "Okay, so just about orchestration, it's about orchestrating the sequence of steps.", 'start': 2424.711, 'duration': 4.744}], 'summary': 'Orchestration, automation, and deployment streamline workflows for model training and deployment.', 'duration': 31.92, 'max_score': 2397.535, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2397535.jpg'}, {'end': 2506.739, 'src': 'embed', 'start': 2481.565, 'weight': 0, 'content': [{'end': 2489.891, 'text': 'Kubeflow pipelines is a open source framework and you use it for constructing, like a kit for building machine learning pipelines,', 'start': 2481.565, 'duration': 8.326}, {'end': 2492.392, 'text': 'to make it easy to do orchestration and automation.', 'start': 2489.891, 'duration': 2.501}, {'end': 2495.434, 'text': "So first I'm going to import the DSL.", 'start': 2493.273, 'duration': 2.161}, {'end': 2504.04, 'text': "So the DSL we're going to use, say, we're going to use like a drawing board to sort of design our pipeline.", 'start': 2496.115, 'duration': 7.925}, {'end': 2506.739, 'text': "And then also we're going to use a compiler.", 'start': 2504.878, 'duration': 1.861}], 'summary': 'Kubeflow pipelines is an open source framework for constructing machine learning pipelines, utilizing dsl for pipeline design and a compiler.', 'duration': 25.174, 'max_score': 2481.565, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2481565.jpg'}, {'end': 2702.087, 'src': 'embed', 'start': 2671.966, 'weight': 1, 'content': [{'end': 2674.228, 'text': 'You may have not heard about a container.', 'start': 2671.966, 'duration': 2.262}, {'end': 2682.194, 'text': 'So containers are like a contained environment that have your dependencies and your software code.', 'start': 2674.828, 'duration': 7.366}, {'end': 2690.199, 'text': "The advantage of containers is that you don't need to manage a server, install your OS,", 'start': 2682.734, 'duration': 7.465}, {'end': 2693.281, 'text': 'install your dependencies there and your software on the server.', 'start': 2690.199, 'duration': 3.082}, {'end': 2698.304, 'text': "It lives in this bubble where you have an OS that's already available for you.", 'start': 2693.721, 'duration': 4.583}, {'end': 2702.087, 'text': 'you install only the dependencies that you can use for your software.', 'start': 2698.304, 'duration': 3.783}], 'summary': 'Containers provide contained environment with os and software dependencies, eliminating server management.', 'duration': 30.121, 'max_score': 2671.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2671966.jpg'}, {'end': 3241.046, 'src': 'embed', 'start': 3194.208, 'weight': 4, 'content': [{'end': 3200.577, 'text': "Let's now look at a real-life example of a machine learning workflow, a machine learning pipeline.", 'start': 3194.208, 'duration': 6.369}, {'end': 3206.766, 'text': "The advantage or one of the advantages of a pipeline is that you're able to reuse it.", 'start': 3201.058, 'duration': 5.708}, {'end': 3212.553, 'text': 'So once you build a pipeline, you can reuse it so maybe you can share it with a colleague.', 'start': 3206.946, 'duration': 5.607}, {'end': 3218.361, 'text': "So let's say you've built a Q&A language model and you build a pipeline that processes the data,", 'start': 3212.734, 'duration': 5.627}, {'end': 3222.086, 'text': 'trains a Q&A language model and then outputs a trained model file.', 'start': 3218.361, 'duration': 3.725}, {'end': 3225.11, 'text': 'Maybe your colleague also has a Q&A use case.', 'start': 3222.246, 'duration': 2.864}, {'end': 3235.26, 'text': 'So your colleague comes to you saying, Can I reuse your pipeline for my use case? So reusability of pipelines is an important advantage.', 'start': 3225.37, 'duration': 9.89}, {'end': 3241.046, 'text': "So in the next example, we're going to reuse an open source Kubeflow pipelines.", 'start': 3235.581, 'duration': 5.465}], 'summary': 'Machine learning pipelines allow for easy reusability and collaboration, as demonstrated in a q&a language model example.', 'duration': 46.838, 'max_score': 3194.208, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3194208.jpg'}, {'end': 3338.222, 'src': 'embed', 'start': 3311.984, 'weight': 6, 'content': [{'end': 3318.205, 'text': "So here as well, we're going to use a date timestamp to keep track of the model that we generate.", 'start': 3311.984, 'duration': 6.221}, {'end': 3322.006, 'text': "We have to specify a model name, and we're going to add the date to it.", 'start': 3318.485, 'duration': 3.521}, {'end': 3326.69, 'text': 'So at least we can always go back in time and see which model was trained when.', 'start': 3322.446, 'duration': 4.244}, {'end': 3329.573, 'text': "There's also some other parameters that we can tweak.", 'start': 3326.931, 'duration': 2.642}, {'end': 3330.895, 'text': 'I want to highlight two.', 'start': 3329.834, 'duration': 1.061}, {'end': 3333.057, 'text': 'So first of all, we have training steps.', 'start': 3331.115, 'duration': 1.942}, {'end': 3338.222, 'text': 'Training steps meaning the number of steps to use when tuning the model.', 'start': 3333.377, 'duration': 4.845}], 'summary': 'Using date timestamp to track model, with customizable parameters like training steps.', 'duration': 26.238, 'max_score': 3311.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3311984.jpg'}], 'start': 2289.477, 'title': 'Automating llm model tuning and kubeflow pipelines', 'summary': 'Discusses automating and orchestrating llm model tuning and deployment using kubeflow pipelines, the advantages of using containers in ml workflows, building scalable pipelines in python, reusability of machine learning pipelines, and pipeline parameter optimization to facilitate easy pipeline re-runs with minor changes.', 'chapters': [{'end': 2540.335, 'start': 2289.477, 'title': 'Automating llm model tuning', 'summary': 'Discusses automating and orchestrating the process of tuning and deploying large language models (llms) using open-source frameworks, particularly focusing on kubeflow pipelines and the concept of orchestration, automation, and deployment.', 'duration': 250.858, 'highlights': ['The process of tuning a model requires gathering training data, training, evaluating, and integrating the trained model with the use case, often involving multiple iterations for experimentation and development.', 'Orchestration involves specifying the sequence of steps in a workflow, while automation ensures the execution of the workflow without manual intervention, facilitating the end-to-end execution of the code, particularly useful in automating the data preparation and model training process.', 'Kubeflow Pipelines, an open-source framework, is used for orchestrating and automating machine learning workflows, providing a drawing board for designing pipelines and a compiler for execution.']}, {'end': 2671.746, 'start': 2540.355, 'title': 'Introduction to kubeflow pipelines', 'summary': 'Introduces kubeflow pipelines as a tool to automate and orchestrate mlops workflows, emphasizing the use of dsl to define pipeline steps and execution logic.', 'duration': 131.391, 'highlights': ['Kubeflow Pipelines automates and orchestrates MLOps workflows, addressing the complexity and time-consuming nature of building and managing pipelines.', "The use of DSL (Domain Specific Language) in Kubeflow Pipelines allows for defining pipeline steps and execution logic in a clear and concise manner, enabling focus on the 'what' of the pipeline rather than the 'how'.", 'Components and pipelines are key concepts in Kubeflow Pipelines, where a pipeline consists of self-contained code and each component represents a step in the workflow.']}, {'end': 2822.147, 'start': 2671.966, 'title': 'Advantages of containers in ml workflow', 'summary': 'Explains the advantages of using containers in ml workflows, highlighting the independence from server management, installation of dependencies, and the ability to orchestrate multiple steps. it also discusses the dependency between components and the best practice of passing file locations instead of the actual data in ml workflows.', 'duration': 150.181, 'highlights': ['Containers provide independence from server management, OS installation, and dependency installation, allowing for easy transfer and execution of the software code on different environments.', 'ML workflows involve orchestrating multiple steps and handling dependencies between components, such as passing data from one component to another.', 'Best practice in ML workflows involves passing the path to the location of files instead of the actual data, particularly important for dealing with large data in containers.']}, {'end': 3193.587, 'start': 2822.427, 'title': 'Building scalable pipelines in python', 'summary': 'Explains the process of building a scalable pipeline in python using kubeflow, including defining components, orchestrating the pipeline, generating a yaml file, and executing the pipeline on vertex ai pipelines.', 'duration': 371.16, 'highlights': ['The process of building a scalable pipeline in Python using Kubeflow involves defining components, orchestrating the pipeline, generating a YAML file, and executing the pipeline on Vertex AI Pipelines.', 'Defining components in a scalable pipeline involves using Python functions with DSL decorators to specify the order of execution and data flow between steps.', 'Generating a YAML file for the pipeline involves using the compiler to specify components, dependencies, and the order of execution, which can then be executed on different environments such as Vertex AI Pipelines.', 'Executing the pipeline on Vertex AI Pipelines allows for serverless execution, removing the need to manage Kubernetes clusters and enabling the visualization of component execution and parameters.']}, {'end': 3358.129, 'start': 3194.208, 'title': 'Reusability of machine learning pipelines', 'summary': 'Discusses the reuse of machine learning pipelines, specifically focusing on the advantages of reusability, using an open source kubeflow pipeline for supervised fine-tuning, and the configuration setup required for executing the pipeline.', 'duration': 163.921, 'highlights': ['The advantage of reusing a pipeline is the ability to share it with colleagues for reuse, as demonstrated in the example of building a Q&A language model pipeline for fine-tuning a palm model and specifying parameters for supervised fine-tuning. (Relevance: 5)', 'The reusability of pipelines is further emphasized by the use of an open source Kubeflow pipeline, which eliminates the need to build the pipeline from scratch and only requires the specification of certain parameters, such as the training and evaluation data files. (Relevance: 4)', 'The importance of setting up configurations, such as specifying model name with a date timestamp for versioning and tweaking parameters like training steps, is highlighted, demonstrating the significance of versioning and understanding the training process. (Relevance: 3)']}, {'end': 3550.767, 'start': 3358.209, 'title': 'Pipeline parameter optimization', 'summary': 'Discusses setting parameters for the palm model, including training steps, evaluation interval, and model name, which is textbison, for question and answering. it emphasizes the flexibility of tweaking pipeline arguments to accommodate different datasets and model types, facilitating easy pipeline re-runs with minor changes.', 'duration': 192.558, 'highlights': ['The best practice for the palm model is to set extractive Q&A between 100 and 500, and in this case, 200 is chosen as the number of training steps.', 'The evaluation interval is set to default at 20, and it specifies the frequency at which a trained model is evaluated against the created evaluation set.', 'The specified arguments for the pipeline include the project ID, region, model name (TextBison), training steps, evaluation interval, and evaluation data URI, allowing flexibility for easy pipeline re-runs with different arguments.', 'Enabling caching ensures that the pipeline will use caching when rerun, avoiding re-execution of steps unless the code or arguments are updated.']}], 'duration': 1261.29, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ2289477.jpg', 'highlights': ['Kubeflow Pipelines automates and orchestrates MLOps workflows, addressing complexity and time-consuming nature of building and managing pipelines.', 'Containers provide independence from server management, OS installation, and dependency installation, allowing for easy transfer and execution of the software code on different environments.', 'The process of tuning a model involves gathering training data, training, evaluating, and integrating the trained model with the use case, often involving multiple iterations for experimentation and development.', 'Orchestration involves specifying the sequence of steps in a workflow, while automation ensures the execution of the workflow without manual intervention, facilitating the end-to-end execution of the code, particularly useful in automating the data preparation and model training process.', 'The advantage of reusing a pipeline is the ability to share it with colleagues for reuse, as demonstrated in the example of building a Q&A language model pipeline for fine-tuning a palm model and specifying parameters for supervised fine-tuning.', 'The reusability of pipelines is further emphasized by the use of an open source Kubeflow pipeline, which eliminates the need to build the pipeline from scratch and only requires the specification of certain parameters, such as the training and evaluation data files.', 'The importance of setting up configurations, such as specifying model name with a date timestamp for versioning and tweaking parameters like training steps, is highlighted, demonstrating the significance of versioning and understanding the training process.']}, {'end': 4889.939, 'segs': [{'end': 3626.576, 'src': 'embed', 'start': 3553.356, 'weight': 0, 'content': [{'end': 3559.76, 'text': "We're not going to execute this pipeline now because if we have to run it, it might take a whole day to execute.", 'start': 3553.356, 'duration': 6.404}, {'end': 3567.105, 'text': "Also, it's quite expensive to run this pipeline because you need accelerators like GPUs or TPUs.", 'start': 3560.141, 'duration': 6.964}, {'end': 3570.227, 'text': "In the next lab, we're going to talk about deployment.", 'start': 3567.646, 'duration': 2.581}, {'end': 3578.033, 'text': 'So once we tuned a model, of course, we also have to deploy it in order to use it in our LLM system.', 'start': 3570.768, 'duration': 7.265}, {'end': 3579.974, 'text': 'And see you in the next lesson.', 'start': 3578.733, 'duration': 1.241}, {'end': 3586.723, 'text': 'So now you will actually get to use the TUDE model to make predictions.', 'start': 3582.981, 'duration': 3.742}, {'end': 3589.144, 'text': 'The team has deployed one for you.', 'start': 3587.543, 'duration': 1.601}, {'end': 3594.627, 'text': 'But there is still more to do in order to integrate this safely into a real-life application.', 'start': 3590.265, 'duration': 4.362}, {'end': 3602.511, 'text': 'We will need to make sure the model will work the same in production, but also that it can be safer and more responsible.', 'start': 3595.847, 'duration': 6.664}, {'end': 3609.354, 'text': 'So you will first want to make sure production data has the same format as the data used in training.', 'start': 3603.871, 'duration': 5.483}, {'end': 3614.731, 'text': 'You look into the response and how to get insights on safety attributes.', 'start': 3611.004, 'duration': 3.727}, {'end': 3622.453, 'text': 'And are there any sources that the response is based on? Okay, welcome to the last lab of this course.', 'start': 3615.272, 'duration': 7.181}, {'end': 3626.576, 'text': "Today we're going to talk about predictions, prompts, and safety scores.", 'start': 3623.214, 'duration': 3.362}], 'summary': 'Pipeline execution delayed, deployment discussed, focus on model integration and safety in real-life applications.', 'duration': 73.22, 'max_score': 3553.356, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3553356.jpg'}, {'end': 3796.473, 'src': 'embed', 'start': 3746.385, 'weight': 6, 'content': [{'end': 3749.949, 'text': 'And you batch all of the examples and you run the predictions in batch.', 'start': 3746.385, 'duration': 3.564}, {'end': 3757.158, 'text': 'A REST API means is that we take our model and we deploy our model as an API.', 'start': 3750.249, 'duration': 6.909}, {'end': 3762.243, 'text': "This means that it's online and you can access it from a service.", 'start': 3757.378, 'duration': 4.865}, {'end': 3768.524, 'text': "So let's say you have a chat application that lets you ask Stack Overflow type of questions.", 'start': 3762.283, 'duration': 6.241}, {'end': 3772.345, 'text': 'So when I come into the application, I ask a question.', 'start': 3768.844, 'duration': 3.501}, {'end': 3774.866, 'text': 'the request goes to the API.', 'start': 3772.345, 'duration': 2.521}, {'end': 3775.826, 'text': 'the API.', 'start': 3774.866, 'duration': 0.96}, {'end': 3784.148, 'text': 'so the model behind the API does a prediction and the prediction goes back into the user interface and I see the result.', 'start': 3775.826, 'duration': 8.322}, {'end': 3788.129, 'text': 'So this is a more real-time use case.', 'start': 3784.828, 'duration': 3.301}, {'end': 3792.29, 'text': 'you need to get the response with a low latency.', 'start': 3789.028, 'duration': 3.262}, {'end': 3796.473, 'text': "There's different ways you can deploy a model as a REST API.", 'start': 3792.81, 'duration': 3.663}], 'summary': 'Deploy model as rest api for real-time predictions with low latency.', 'duration': 50.088, 'max_score': 3746.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3746385.jpg'}, {'end': 3924.137, 'src': 'embed', 'start': 3897.876, 'weight': 8, 'content': [{'end': 3903.778, 'text': 'We deployed multiple models to make sure we can spread all of the traffic over the three models.', 'start': 3897.876, 'duration': 5.902}, {'end': 3912.64, 'text': "We're going to use a very basic form of load balancing to choose one of the endpoints to send our prompt to and get a prediction.", 'start': 3904.178, 'duration': 8.462}, {'end': 3914.621, 'text': "So we're going to use the random library.", 'start': 3912.94, 'duration': 1.681}, {'end': 3924.137, 'text': "And we're going to choose from our list randomly one endpoint that you're going to use to get a prediction from.", 'start': 3917.415, 'duration': 6.722}], 'summary': 'Multiple models were deployed to evenly distribute traffic, using basic load balancing with the random library to select endpoints.', 'duration': 26.261, 'max_score': 3897.876, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3897876.jpg'}, {'end': 4004.36, 'src': 'embed', 'start': 3951.413, 'weight': 3, 'content': [{'end': 3959.594, 'text': "And since we've done tuning on the Stack Overflow dataset, I think we should ask it a Python question,", 'start': 3951.413, 'duration': 8.181}, {'end': 3964.115, 'text': 'because we selected all of the Python questions from the Stack Overflow dataset.', 'start': 3959.594, 'duration': 4.521}, {'end': 3967.576, 'text': 'We want to stay as close as to what we trained on as possible.', 'start': 3964.275, 'duration': 3.301}, {'end': 3970.156, 'text': "And I'll talk about this later as well, why this is important.", 'start': 3967.716, 'duration': 2.44}, {'end': 3972.997, 'text': 'Let me ask a straightforward question.', 'start': 3970.636, 'duration': 2.361}, {'end': 3980.063, 'text': "How can I load a CSV file using pandas? This is something we've done in the first lab.", 'start': 3974.057, 'duration': 6.006}, {'end': 3981.404, 'text': 'We have our prompt.', 'start': 3980.383, 'duration': 1.021}, {'end': 3984.847, 'text': "We've loaded the API.", 'start': 3982.465, 'duration': 2.382}, {'end': 3990.413, 'text': 'Now we can send our prompt to the API and get our response.', 'start': 3985.588, 'duration': 4.825}, {'end': 3993.015, 'text': 'For this, we can use the Vertex AI SDK.', 'start': 3990.793, 'duration': 2.222}, {'end': 3998.458, 'text': "And we're just going to run a dot predict on the deployed model.", 'start': 3994.056, 'duration': 4.402}, {'end': 4002.159, 'text': 'And we take the prompt into the dot predict.', 'start': 3998.738, 'duration': 3.421}, {'end': 4004.36, 'text': 'So the prompt goes to the API.', 'start': 4002.859, 'duration': 1.501}], 'summary': 'Tuned stack overflow dataset for python questions, using vertex ai sdk for prompt response.', 'duration': 52.947, 'max_score': 3951.413, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3951413.jpg'}, {'end': 4195.264, 'src': 'embed', 'start': 4170.923, 'weight': 11, 'content': [{'end': 4179.191, 'text': "I've talked about packaging, deploying and versioning your model, but there's also other things that we have to take into consideration,", 'start': 4170.923, 'duration': 8.268}, {'end': 4180.551, 'text': 'like doing model monitoring.', 'start': 4179.191, 'duration': 1.36}, {'end': 4183.175, 'text': "There's different ways you can monitor your model.", 'start': 4180.732, 'duration': 2.443}, {'end': 4186.077, 'text': "First of all, There's the operational metrics.", 'start': 4183.475, 'duration': 2.602}, {'end': 4195.264, 'text': "Let's say, how often do you send the prediction to the API? Of course, you also want to evaluate the performance of your model in production.", 'start': 4186.457, 'duration': 8.807}], 'summary': 'Discussed model monitoring, including operational metrics and evaluating model performance in production.', 'duration': 24.341, 'max_score': 4170.923, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ4170923.jpg'}, {'end': 4350.395, 'src': 'embed', 'start': 4327.005, 'weight': 4, 'content': [{'end': 4334.488, 'text': 'This means is that the model that we trained was trained on data that has an instruction and a question.', 'start': 4327.005, 'duration': 7.483}, {'end': 4336.909, 'text': "But now in production, we're only sending the question.", 'start': 4334.828, 'duration': 2.081}, {'end': 4341.971, 'text': "So there's a mismatch what we trained on versus what we have available in production.", 'start': 4337.109, 'duration': 4.862}, {'end': 4343.592, 'text': "There's a skew between the two.", 'start': 4342.251, 'duration': 1.341}, {'end': 4350.395, 'text': "It's very important that your production data is very much the same as the data that you trained on.", 'start': 4343.932, 'duration': 6.463}], 'summary': 'Model trained on instruction and question data, but in production only sending question, causing a skew and mismatch.', 'duration': 23.39, 'max_score': 4327.005, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ4327005.jpg'}, {'end': 4504.312, 'src': 'embed', 'start': 4473.026, 'weight': 5, 'content': [{'end': 4477.488, 'text': "The thing with large language models is that they can generate output that you don't expect.", 'start': 4473.026, 'duration': 4.462}, {'end': 4484.077, 'text': 'including text that can be offensive, insensitive, or maybe factually incorrect.', 'start': 4478.553, 'duration': 5.524}, {'end': 4485.138, 'text': "What's more,", 'start': 4484.417, 'duration': 0.721}, {'end': 4495.205, 'text': 'the incredible versatility of large language models is also what it makes difficult to predict exactly what kind of unintended or unforeseen outputs they might produce.', 'start': 4485.138, 'duration': 10.067}, {'end': 4504.312, 'text': "So it's important for you as a developer practitioner to understand and test these models to make sure you deploy them safely and responsibly.", 'start': 4495.465, 'duration': 8.847}], 'summary': 'Large language models can produce unexpected and potentially offensive or incorrect text, making it crucial for developers to test and deploy them responsibly.', 'duration': 31.286, 'max_score': 4473.026, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ4473026.jpg'}, {'end': 4631.252, 'src': 'embed', 'start': 4609.862, 'weight': 10, 'content': [{'end': 4619.87, 'text': 'So the second level is where you as a practitioner and developer can use safety attributes and safety scores to set up your own thresholds.', 'start': 4609.862, 'duration': 10.008}, {'end': 4624.731, 'text': "Let's have a look at the safety attributes to get a better understanding of them.", 'start': 4620.33, 'duration': 4.401}, {'end': 4631.252, 'text': "So from our response, we're going to use the key safety attributes to retrieve the safety attributes and the scores.", 'start': 4625.351, 'duration': 5.901}], 'summary': 'Practitioners can use safety attributes and scores to set thresholds.', 'duration': 21.39, 'max_score': 4609.862, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ4609862.jpg'}, {'end': 4694.151, 'src': 'embed', 'start': 4664.163, 'weight': 13, 'content': [{'end': 4666.923, 'text': 'You can also see here the probability scores.', 'start': 4664.163, 'duration': 2.76}, {'end': 4674.385, 'text': 'So each safety attribute is associated with a confidence score, a probability score between zero and one.', 'start': 4667.724, 'duration': 6.661}, {'end': 4682.047, 'text': 'and is rounded to one decimal place, reflecting the likelihood of the content being unsafe.', 'start': 4675.305, 'duration': 6.742}, {'end': 4684.508, 'text': 'So it scores for each category.', 'start': 4682.427, 'duration': 2.081}, {'end': 4690.29, 'text': 'You can see here that the probability score for most categories is 0.1.', 'start': 4685.168, 'duration': 5.122}, {'end': 4694.151, 'text': 'I mentioned that the probability score can be between 0 and 1.', 'start': 4690.29, 'duration': 3.861}], 'summary': 'Transcript: probability scores for safety attributes range from 0 to 1, mostly at 0.1.', 'duration': 29.988, 'max_score': 4664.163, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ4664163.jpg'}, {'end': 4857.156, 'src': 'embed', 'start': 4828.887, 'weight': 14, 'content': [{'end': 4834.794, 'text': 'from citation metadata we can take the citations to check if the response is cited somewhere.', 'start': 4828.887, 'duration': 5.907}, {'end': 4840.48, 'text': "Let's use pretty print again to see if there's any citation in the response.", 'start': 4835.214, 'duration': 5.266}, {'end': 4844.53, 'text': "That's good.", 'start': 4844.13, 'duration': 0.4}, {'end': 4846.931, 'text': 'The response is not cited somewhere.', 'start': 4844.97, 'duration': 1.961}, {'end': 4854.275, 'text': "If there is a citation, the model will return a page from where it's cited from, like a website.", 'start': 4847.452, 'duration': 6.823}, {'end': 4857.156, 'text': "We don't have any citation in this example.", 'start': 4854.815, 'duration': 2.341}], 'summary': 'Using citation metadata, no citations found in the response.', 'duration': 28.269, 'max_score': 4828.887, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ4828887.jpg'}], 'start': 3553.356, 'title': 'Deploying ml models', 'summary': 'Addresses challenges in ml pipeline, importance of deployment after model tuning, methods like batch and rest api, vertex ai sdk, considerations for production workloads, and leveraging safety attributes.', 'chapters': [{'end': 3626.576, 'start': 3553.356, 'title': 'Predictions and deployment in ml', 'summary': 'Discusses the challenges of running an expensive and time-consuming ml pipeline, the importance of deployment after model tuning, and the need to ensure safety and integration in real-life applications.', 'duration': 73.22, 'highlights': ['The importance of deployment after model tuning and the need for integration into a real-life application, emphasizing the safety and responsibility aspects.', 'Challenges related to running an expensive and time-consuming ML pipeline, which might take a whole day to execute and requires accelerators like GPUs or TPUs.', 'The necessity to ensure production data has the same format as the training data and the need to inspect safety attributes and sources of the response.']}, {'end': 3972.997, 'start': 3627.016, 'title': 'Deploying machine learning models', 'summary': 'Discusses the deployment of machine learning models using batch and rest api methods, including the use of vertex ai sdk and the process of selecting and utilizing endpoints for model predictions.', 'duration': 345.981, 'highlights': ['The chapter explains the concept of deploying machine learning models through batch and REST API methods, with the use of Vertex AI SDK and the selection of endpoints for model predictions.', 'It discusses the batch method of deploying models for offline processing, where trained models are used to predict outcomes in bulk, such as scoring customer reviews once a week.', 'The REST API method for deploying models is detailed, emphasizing real-time use cases and the need for low latency responses, including the use of Flask and FAST API libraries for packaging models as APIs.', 'The process of selecting and utilizing endpoints for model predictions is explained, including the use of random selection for load balancing among multiple endpoints.', 'The importance of staying close to the training data when formulating prompts for model predictions is highlighted, with a specific example of asking a Python question based on the tuned Stack Overflow dataset.']}, {'end': 4495.205, 'start': 3974.057, 'title': 'Loading csv file using pandas and deploying models', 'summary': 'Covers loading a csv file using pandas and deploying a model with vertex ai sdk, discussing latency, model monitoring, and considerations for production machine learning workloads, including operational metrics, performance evaluation, safety, scalability, and permissible latency.', 'duration': 521.148, 'highlights': ['The chapter explains the process of loading a CSV file using pandas and deploying a model with Vertex AI SDK, emphasizing the importance of considering latency, which can vary based on the size of the model and prompt, and the need for formatting the response using pretty print. It also highlights the significance of model monitoring for operational metrics, performance evaluation, safety, and scalability, along with the importance of discussing permissible latency with stakeholders. (Relevance score: 5)', "The chapter emphasizes the importance of ensuring consistency between the production data and the data used for model training to maintain model performance, necessitating the addition of an instruction to the question prompt before sending it to the model. It also demonstrates the process of combining instruction and question to create a prompt and sending it to the API to obtain the model's response. (Relevance score: 4)", 'The chapter discusses the challenges associated with large language models, such as generating unexpected output that may be offensive, insensitive, or factually incorrect, and the difficulty in predicting unintended or unforeseen outputs due to the versatility of large language models. (Relevance score: 3)']}, {'end': 4889.939, 'start': 4495.465, 'title': 'Leveraging safety attributes in ai models', 'summary': 'Discusses leveraging safety attributes in ai models to set up safety thresholds, including probabilities and severity scores, and checking for citations in the model response.', 'duration': 394.474, 'highlights': ['You can use safety attributes and safety scores to set up your own thresholds based on probabilities and severity scores.', 'The safety categories include finance, politics, and safety ratings with different categories.', 'The probability score for most categories is 0.1, reflecting the likelihood of the content being unsafe.', "Deciding the threshold for safety attributes is dependent on the user's requirements and the type of users they are working with.", 'The response includes citation metadata, allowing the checking of citations in the model response.']}], 'duration': 1336.583, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/q5CtyGT5SkQ/pics/q5CtyGT5SkQ3553356.jpg', 'highlights': ['The importance of deployment after model tuning and integration into a real-life application, emphasizing safety and responsibility.', 'Challenges related to running an expensive and time-consuming ML pipeline, requiring accelerators like GPUs or TPUs.', 'The necessity to ensure production data has the same format as the training data and inspect safety attributes and sources of the response.', 'The process of loading a CSV file using pandas and deploying a model with Vertex AI SDK, emphasizing the importance of considering latency and formatting the response.', 'The importance of ensuring consistency between production data and data used for model training, necessitating the addition of an instruction to the question prompt before sending it to the model.', 'The challenges associated with large language models, such as generating unexpected output that may be offensive, insensitive, or factually incorrect.', 'The concept of deploying machine learning models through batch and REST API methods, with the use of Vertex AI SDK and the selection of endpoints for model predictions.', 'The REST API method for deploying models is detailed, emphasizing real-time use cases and the need for low latency responses, including the use of Flask and FAST API libraries for packaging models as APIs.', 'The process of selecting and utilizing endpoints for model predictions, including the use of random selection for load balancing among multiple endpoints.', 'The importance of staying close to the training data when formulating prompts for model predictions, with a specific example of asking a Python question based on the tuned Stack Overflow dataset.', 'The use of safety attributes and safety scores to set up thresholds based on probabilities and severity scores, including categories like finance, politics, and safety ratings.', 'The significance of model monitoring for operational metrics, performance evaluation, safety, and scalability, along with the importance of discussing permissible latency with stakeholders.', 'The challenges associated with large language models, such as predicting unintended or unforeseen outputs due to the versatility of large language models.', 'The probability score for most categories is 0.1, reflecting the likelihood of the content being unsafe, and deciding the threshold for safety attributes is dependent on user requirements and the type of users they are working with.', 'The response includes citation metadata, allowing the checking of citations in the model response.']}], 'highlights': ['Automated processes for designing and deploying LLM-based applications are crucial to streamline workflow and enhance productivity.', 'The LLMOps pipeline involves data preparation, versioning, supervised tuning, artifact generation, model deployment, prompt usage, and safety evaluation.', 'Kubeflow Pipelines automates and orchestrates MLOps workflows, addressing complexity and time-consuming nature of building and managing pipelines.', 'Dealing with text data and lots of it is crucial for LLM Ops, including retrieving Stack Overflow text data from a data warehouse, using SQL to handle large data, and modifying data to tune a model.', 'The data is divided into training and evaluation with 80-20 split by default.']}