title
State of GPT | BRK216HFS
description
Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions.
*Speakers:*
* Andrej Karpathy
*Session Information:*
This video is one of many sessions delivered for the Microsoft Build 2023 event. View the full session schedule and learn more about Microsoft Build at https://build.microsoft.com
BRK216HFS | English (US) | AI
#MSBuild
detail
{'title': 'State of GPT | BRK216HFS', 'heatmap': [{'end': 259.94, 'start': 228.361, 'weight': 0.714}, {'end': 871.208, 'start': 843.493, 'weight': 0.711}, {'end': 1536.99, 'start': 1509.064, 'weight': 1}], 'summary': 'Covers gpt training stages with internet-scale datasets, reinforcement learning for prompt completion, cognitive disparities between human reasoning and gpt models, transformer model optimization, and llms and gpt-4 utilization including prompt engineering techniques and fine-tuning.', 'chapters': [{'end': 770.935, 'segs': [{'end': 130.014, 'src': 'embed', 'start': 91.572, 'weight': 1, 'content': [{'end': 96.295, 'text': 'and this diagram is not to scale because this stage is where all of the computational work basically happens.', 'start': 91.572, 'duration': 4.723}, {'end': 101.278, 'text': 'This is 99% of the training compute time and also flops.', 'start': 96.395, 'duration': 4.883}, {'end': 110.765, 'text': 'And so this is where we are dealing with internet scale data sets with thousands of GPUs in the supercomputer and also months of training potentially.', 'start': 101.979, 'duration': 8.786}, {'end': 118.521, 'text': 'The other three stages are fine tuning stages that are much more along the lines of small few number of GPUs and hours or days.', 'start': 111.405, 'duration': 7.116}, {'end': 122.068, 'text': "So let's take a look at the pre-training stage to achieve a base model.", 'start': 119.342, 'duration': 2.726}, {'end': 127.712, 'text': "First, we're going to gather a large amount of data.", 'start': 124.008, 'duration': 3.704}, {'end': 130.014, 'text': "here's an example of what we call a data mixture.", 'start': 127.712, 'duration': 2.302}], 'summary': '99% of training compute time and flops occur in the main stage, dealing with internet-scale data sets using thousands of gpus and months of training, while fine-tuning stages use fewer gpus and take hours or days.', 'duration': 38.442, 'max_score': 91.572, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A91572.jpg'}, {'end': 292.18, 'src': 'heatmap', 'start': 228.361, 'weight': 0, 'content': [{'end': 232.864, 'text': 'The context length is usually something like 2, 000, 4, 000, or nowadays even 100, 000.', 'start': 228.361, 'duration': 4.503}, {'end': 240.069, 'text': "And this governs the maximum number of integers that the GPT will look at when it's trying to predict the next integer in a sequence.", 'start': 232.864, 'duration': 7.205}, {'end': 243.293, 'text': 'YOU CAN SEE THAT ROUGHLY.', 'start': 242.233, 'duration': 1.06}, {'end': 246.415, 'text': 'THE NUMBER OF PARAMETERS IS, SAY, 65 BILLION FOR LAMA.', 'start': 243.293, 'duration': 3.122}, {'end': 246.975, 'text': 'NOW, EVEN THOUGH.', 'start': 246.415, 'duration': 0.56}, {'end': 251.437, 'text': "LAMA HAS ONLY 65 B PARAMETERS, COMPARED TO GPT-3'S 175 BILLION PARAMETERS.", 'start': 246.975, 'duration': 4.462}, {'end': 259.94, 'text': "LAMA IS A SIGNIFICANTLY MORE POWERFUL MODEL AND INTUITIVELY THAT'S BECAUSE THE MODEL IS TRAINED FOR SIGNIFICANTLY LONGER IN THIS CASE 1.4 TRILLION TOKENS,", 'start': 251.437, 'duration': 8.503}, {'end': 261.76, 'text': 'INSTEAD OF JUST 300 BILLION TOKENS.', 'start': 259.94, 'duration': 1.82}, {'end': 265.462, 'text': "SO YOU SHOULDN'T JUDGE THE POWER OF A MODEL JUST BY THE NUMBER OF PARAMETERS THAT IT CONTAINS.", 'start': 261.76, 'duration': 3.702}, {'end': 273.688, 'text': "Below I'm showing some tables of rough hyperparameters that typically go into specifying the transformer neural network.", 'start': 266.943, 'duration': 6.745}, {'end': 277.03, 'text': 'So the number of heads, the dimension size, number of layers and so on.', 'start': 274.088, 'duration': 2.942}, {'end': 280.412, 'text': "And on the bottom I'm showing some training hyperparameters.", 'start': 277.77, 'duration': 2.642}, {'end': 292.18, 'text': 'So for example to train the 65B model Meta used 2000 GPUs, roughly 21 days of training and roughly several million dollars.', 'start': 281.553, 'duration': 10.627}], 'summary': 'Lama, with 65b parameters, is more powerful than gpt-3 due to training on 1.4t tokens. 2000 gpus, 21 days, million $ for 65b model training.', 'duration': 40.743, 'max_score': 228.361, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A228361.jpg'}, {'end': 577.53, 'src': 'embed', 'start': 551.337, 'weight': 4, 'content': [{'end': 557.56, 'text': 'Now around the time of GPT-2 people noticed that actually even better than fine tuning you can actually prompt these models very effectively.', 'start': 551.337, 'duration': 6.223}, {'end': 560.041, 'text': 'So these are language models and they want to complete documents.', 'start': 557.82, 'duration': 2.221}, {'end': 565.284, 'text': 'So you can actually trick them into performing tasks just by arranging these fake documents.', 'start': 560.501, 'duration': 4.783}, {'end': 570.887, 'text': 'So in this example for example we have some passage and then we sort of like do QA QA QA.', 'start': 565.744, 'duration': 5.143}, {'end': 577.53, 'text': "This is called a few shot prompt and then we do Q and then as the transformer is trying to complete the document it's actually answering our question.", 'start': 571.067, 'duration': 6.463}], 'summary': 'Gpt-2 can be prompted effectively with fake documents to perform tasks, providing answers as it completes documents.', 'duration': 26.193, 'max_score': 551.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A551337.jpg'}], 'start': 8.096, 'title': 'Gpt training and application', 'summary': 'Highlights gpt training stages, involving internet-scale datasets, thousands of gpus, and months of training, emphasizing the types of datasets used such as common crawl, c4, github, wikipedia, and high-quality datasets. it also discusses the pre-processing steps for training gpt and the subsequent supervised fine-tuning stage for creating gpt assistants.', 'chapters': [{'end': 151.493, 'start': 8.096, 'title': 'Gpt training and application', 'summary': 'Highlights gpt training stages, pre-training being the most computationally intensive, involving internet-scale datasets, thousands of gpus, and months of training, while fine-tuning stages require fewer gpus and less time, and it emphasizes the types of datasets used in the pre-training stage such as common crawl, c4, github, wikipedia, and high-quality datasets.', 'duration': 143.397, 'highlights': ['The pre-training stage is the most computationally intensive, requiring internet-scale datasets, thousands of GPUs, and months of training, while the fine-tuning stages require fewer GPUs and less time.', 'The pre-training stage involves gathering a large amount of data from various sources such as common crawl, C4, GitHub, Wikipedia, and high-quality datasets like books archive and stack exchange.']}, {'end': 770.935, 'start': 151.493, 'title': 'Gpt pre-training and fine-tuning', 'summary': 'Discusses the pre-processing steps for training gpt, including tokenization, hyperparameters, and training process, with specific emphasis on gpt-4 and lama models, and the subsequent supervised fine-tuning stage for creating gpt assistants.', 'duration': 619.442, 'highlights': ["The LAMA model has 65 billion parameters and is trained on 1.4 trillion tokens, making it more powerful than GPT-3's 175 billion parameters trained on 300 billion tokens.", 'The vocabulary size for pre-training is usually a couple of 10,000 tokens, and the context length is typically 2,000 to 100,000, governing the maximum number of integers the GPT will look at when predicting the next integer in a sequence.', 'Meta used 2000 GPUs, roughly 21 days of training, and several million dollars to train the 65B model, providing insights into the resources required for pre-training.', 'The process of pre-training involves laying tokens into data batches, feeding them into the transformer, and using them to predict the next token in a sequence.', 'Supervised fine-tuning involves collecting small but high-quality datasets, such as prompt and ideal response pairs, and conducting language modeling to create GPT assistants.']}], 'duration': 762.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A8096.jpg', 'highlights': ["The LAMA model has 65 billion parameters and is trained on 1.4 trillion tokens, surpassing GPT-3's 175 billion parameters trained on 300 billion tokens.", 'The pre-training stage is the most computationally intensive, requiring internet-scale datasets, thousands of GPUs, and months of training.', 'The pre-training stage involves gathering a large amount of data from various sources such as common crawl, C4, GitHub, Wikipedia, and high-quality datasets like books archive and stack exchange.', 'Meta used 2000 GPUs, roughly 21 days of training, and several million dollars to train the 65B model, providing insights into the resources required for pre-training.', 'Supervised fine-tuning involves collecting small but high-quality datasets, such as prompt and ideal response pairs, and conducting language modeling to create GPT assistants.']}, {'end': 1213.124, 'segs': [{'end': 871.208, 'src': 'embed', 'start': 843.493, 'weight': 4, 'content': [{'end': 849.716, 'text': 'Then we can follow that with something that looks very much kind of like a binary classification on all the possible pairs between these completions.', 'start': 843.493, 'duration': 6.223}, {'end': 855.88, 'text': 'So what we do now is we lay out our prompt in rows and the prompts is identical across all three rows here.', 'start': 850.417, 'duration': 5.463}, {'end': 861.463, 'text': "So it's all the same prompt but the completion this very and so the yellow tokens are coming from the SFT model.", 'start': 856.24, 'duration': 5.223}, {'end': 866.686, 'text': 'Then what we do is we append another special reward readout token at the end.', 'start': 862.203, 'duration': 4.483}, {'end': 871.208, 'text': 'And we basically only supervise the transformer at this single green token.', 'start': 867.423, 'duration': 3.785}], 'summary': 'Binary classification on completion pairs with single token supervision.', 'duration': 27.715, 'max_score': 843.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A843493.jpg'}, {'end': 871.208, 'src': 'heatmap', 'start': 843.493, 'weight': 0.711, 'content': [{'end': 849.716, 'text': 'Then we can follow that with something that looks very much kind of like a binary classification on all the possible pairs between these completions.', 'start': 843.493, 'duration': 6.223}, {'end': 855.88, 'text': 'So what we do now is we lay out our prompt in rows and the prompts is identical across all three rows here.', 'start': 850.417, 'duration': 5.463}, {'end': 861.463, 'text': "So it's all the same prompt but the completion this very and so the yellow tokens are coming from the SFT model.", 'start': 856.24, 'duration': 5.223}, {'end': 866.686, 'text': 'Then what we do is we append another special reward readout token at the end.', 'start': 862.203, 'duration': 4.483}, {'end': 871.208, 'text': 'And we basically only supervise the transformer at this single green token.', 'start': 867.423, 'duration': 3.785}], 'summary': 'Binary classification on prompt completions, supervising transformer at single token', 'duration': 27.715, 'max_score': 843.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A843493.jpg'}, {'end': 943.036, 'src': 'embed', 'start': 916.417, 'weight': 1, 'content': [{'end': 923.099, 'text': 'now, because we have a reward model, we can score the quality of any arbitrary completion for any given prompt.', 'start': 916.417, 'duration': 6.682}, {'end': 923.519, 'text': 'so we do.', 'start': 923.099, 'duration': 0.42}, {'end': 930.882, 'text': 'during reinforcement learning is we basically get again a large collection of prompts and now we do reinforcement learning with respect to the reward model.', 'start': 923.519, 'duration': 7.363}, {'end': 931.742, 'text': "so here's what that looks like.", 'start': 930.882, 'duration': 0.86}, {'end': 937.774, 'text': 'We take a single prompt, we lay it out in rows and now we use the sft.', 'start': 932.813, 'duration': 4.961}, {'end': 943.036, 'text': "we use the basically the model we'd like to train, which is initialized at sft model, to create some completions in yellow.", 'start': 937.774, 'duration': 5.262}], 'summary': 'Using the reward model, they score completion quality during reinforcement learning.', 'duration': 26.619, 'max_score': 916.417, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A916417.jpg'}, {'end': 1061.769, 'src': 'embed', 'start': 1034.949, 'weight': 0, 'content': [{'end': 1040.713, 'text': 'Now why would you want to do RLHF? So one answer that is kind of not that exciting is that it just works better.', 'start': 1034.949, 'duration': 5.764}, {'end': 1042.535, 'text': 'So this comes from the InstructGPT paper.', 'start': 1040.973, 'duration': 1.562}, {'end': 1045.817, 'text': 'According to these experiments a while ago now,', 'start': 1043.035, 'duration': 2.782}, {'end': 1053.663, 'text': 'these PPO models are RLHF and we see that they are basically just preferred in a lot of comparisons when we give them to humans.', 'start': 1045.817, 'duration': 7.846}, {'end': 1061.769, 'text': 'So humans just prefer basically tokens that come from RLHF models compared to SFT models compared to base model that is prompted to be an assistant.', 'start': 1053.763, 'duration': 8.006}], 'summary': 'Rlhf models perform better in human preference tests according to instructgpt paper.', 'duration': 26.82, 'max_score': 1034.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1034949.jpg'}, {'end': 1131.859, 'src': 'embed', 'start': 1108.255, 'weight': 2, 'content': [{'end': 1116.971, 'text': 'this asymmetry makes it so that comparisons are a better way to potentially leverage yourself as a human and your judgment to create a slightly better model.', 'start': 1108.255, 'duration': 8.716}, {'end': 1123.154, 'text': 'Now, ROHF models are not strictly an improvement on the base models in some cases.', 'start': 1118.151, 'duration': 5.003}, {'end': 1126.596, 'text': "So in particular, we've noticed, for example, that they lose some entropy.", 'start': 1123.734, 'duration': 2.862}, {'end': 1129.337, 'text': 'So that means that they give more peaky results.', 'start': 1127.016, 'duration': 2.321}, {'end': 1131.859, 'text': 'They can output lower variations.', 'start': 1129.818, 'duration': 2.041}], 'summary': 'Comparisons can leverage human judgment for a slightly better model, but rohf models may result in lower variations.', 'duration': 23.604, 'max_score': 1108.255, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1108255.jpg'}, {'end': 1219.489, 'src': 'embed', 'start': 1193.481, 'weight': 3, 'content': [{'end': 1200.745, 'text': 'so currently some of the best models, of course, are gpt for by far, I would say, followed by Claude gpt 3.5 and then a number of models.', 'start': 1193.481, 'duration': 7.264}, {'end': 1203.947, 'text': 'some of these might be available as weights, like the kuna koala, etc.', 'start': 1200.745, 'duration': 3.202}, {'end': 1213.124, 'text': "And the first three rows here are all they're all our late chef models and all of the other models to my knowledge are sft models I believe.", 'start': 1204.938, 'duration': 8.186}, {'end': 1219.489, 'text': "Okay, so that's how we train these models on the high level.", 'start': 1216.267, 'duration': 3.222}], 'summary': 'Top models include gpt-4, gpt-3.5, and others. trained on high level.', 'duration': 26.008, 'max_score': 1193.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1193481.jpg'}], 'start': 770.935, 'title': 'Reinforcement learning for prompt completion and utilization of rlhf models', 'summary': 'Delves into using reinforcement learning for prompt completion, involving comparing and ranking prompt completions with the sft model and discusses training and using rlhf models, along with their advantages over base and sft models and the state of available assistant models and their rankings.', 'chapters': [{'end': 842.518, 'start': 770.935, 'title': 'Reinforcement learning for prompt completion', 'summary': 'Discusses using reinforcement learning from human feedback for prompt completion, involving comparing and ranking prompt completions with the sft model, which can take hours for a single prompt completion pair.', 'duration': 71.583, 'highlights': ['The process involves training models to complete prompts, followed by shifting data collection to comparisons of prompt completions, which can take hours for a single pair.', 'Using the SFT model, multiple completions are created and then people are asked to rank these completions, which can be a time-consuming task.', 'The data set consists of comparisons of prompt completions, such as ranking the quality of completions for a given prompt, which can take people hours to complete.', 'The chapter explains the shift from data collection based on following instructions to comparing and ranking prompt completions, which is a time-consuming process.']}, {'end': 1213.124, 'start': 843.493, 'title': 'Training and utilizing rlhf models', 'summary': 'Explains a method for training and using rlhf models, including how the reward model is trained, the reinforcement learning stage, and the advantages of rlhf models over base and sft models, as well as the state of available assistant models and their rankings.', 'duration': 369.631, 'highlights': ['RLHF models are preferred over base and SFT models, as shown in experiments where PPO models are RLHF and are preferred by humans in comparisons.', 'The reward model is trained to predict the quality of completions for a given prompt, using a loss function formulated from ground truth rankings, enabling scoring of completions.', 'RLHF models may generate less varied outputs compared to base models, which can be preferable in scenarios requiring diverse outputs, such as generating diverse Pokemon names.', 'RLHF models leverage the asymmetry between the ease of comparing and generating, potentially allowing humans to better judge and improve model outputs.', 'GPT models, particularly GPT-4, are among the best available assistant models, with some others like Kuna Koala being SFT models.']}], 'duration': 442.189, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A770935.jpg', 'highlights': ['RLHF models are preferred over base and SFT models, as shown in experiments where PPO models are RLHF and are preferred by humans in comparisons.', 'The reward model is trained to predict the quality of completions for a given prompt, using a loss function formulated from ground truth rankings, enabling scoring of completions.', 'RLHF models leverage the asymmetry between the ease of comparing and generating, potentially allowing humans to better judge and improve model outputs.', 'GPT models, particularly GPT-4, are among the best available assistant models, with some others like Kuna Koala being SFT models.', 'The process involves training models to complete prompts, followed by shifting data collection to comparisons of prompt completions, which can take hours for a single pair.']}, {'end': 1807.254, 'segs': [{'end': 1456.54, 'src': 'embed', 'start': 1430.182, 'weight': 0, 'content': [{'end': 1434.603, 'text': 'THEY DO ACTUALLY HAVE A VERY LARGE FACT BASED KNOWLEDGE, ACROSS A VAST NUMBER OF AREAS.', 'start': 1430.182, 'duration': 4.421}, {'end': 1437.064, 'text': 'BECAUSE THEY HAVE SAY, SEVERAL 10 BILLION PARAMETERS.', 'start': 1434.603, 'duration': 2.461}, {'end': 1440.085, 'text': "SO IT'S A LOT OF STORAGE FOR A LOT OF FACTS.", 'start': 1437.064, 'duration': 3.021}, {'end': 1444.787, 'text': 'BUT, AND THEY, ALSO, I THINK, HAVE A RELATIVELY LARGE AND PERFECT WORKING MEMORY.', 'start': 1440.085, 'duration': 4.702}, {'end': 1446.047, 'text': 'SO WHATEVER FIXED INTO THE?', 'start': 1444.787, 'duration': 1.26}, {'end': 1447.568, 'text': 'WHATEVER FITS INTO THE CONTEXT WINDOW.', 'start': 1446.047, 'duration': 1.521}, {'end': 1454.998, 'text': "Is immediately available to the transformer through its internal self attention mechanism, and so it's kind of like perfect memory,", 'start': 1448.152, 'duration': 6.846}, {'end': 1456.54, 'text': "but it's got a finite size.", 'start': 1454.998, 'duration': 1.542}], 'summary': 'Ai has a vast knowledge base with several 10 billion parameters, enabling access to a large amount of facts and a relatively large working memory.', 'duration': 26.358, 'max_score': 1430.182, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1430182.jpg'}, {'end': 1514.767, 'src': 'embed', 'start': 1485.566, 'weight': 1, 'content': [{'end': 1491.931, 'text': "ESPECIALLY IF YOUR TASKS REQUIRE REASONING, YOU CAN'T EXPECT THE TRANSFORMER TO DO TOO MUCH REASONING PER TOKEN.", 'start': 1485.566, 'duration': 6.365}, {'end': 1495.636, 'text': 'And so you have to really spread out the reasoning across more and more tokens.', 'start': 1492.469, 'duration': 3.167}, {'end': 1500.96, 'text': "So, for example, you can't give a transformer a very complicated question and expect it to get the answer in a single token.", 'start': 1496.158, 'duration': 4.802}, {'end': 1502.921, 'text': "there's just not enough time for it.", 'start': 1500.96, 'duration': 1.961}, {'end': 1509.064, 'text': 'these transformers need tokens to think quote, unquote, I like to say sometimes and so this is some of the things that work well.', 'start': 1502.921, 'duration': 6.143}, {'end': 1514.767, 'text': "you may, for example, have a few shot prompt that shows the transformer that it should like show its work when it's answering a question.", 'start': 1509.064, 'duration': 5.703}], 'summary': 'Transformers require reasoning spread across multiple tokens for complex tasks.', 'duration': 29.201, 'max_score': 1485.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1485566.jpg'}, {'end': 1536.99, 'src': 'heatmap', 'start': 1509.064, 'weight': 1, 'content': [{'end': 1514.767, 'text': "you may, for example, have a few shot prompt that shows the transformer that it should like show its work when it's answering a question.", 'start': 1509.064, 'duration': 5.703}, {'end': 1517.188, 'text': "when it's answering a question, and if you give a few examples,", 'start': 1514.767, 'duration': 2.421}, {'end': 1523.751, 'text': 'the transformer will imitate that template and it will just end up working out better in terms of its evaluation.', 'start': 1517.188, 'duration': 6.563}, {'end': 1529.423, 'text': "Additionally, you can elicit this kind of behavior from the transformer by saying let's think step by step,", 'start': 1524.659, 'duration': 4.764}, {'end': 1536.99, 'text': 'because this conditions the transformer into sort of like showing its work and because it kind of snaps into a mode of showing its work,', 'start': 1529.423, 'duration': 7.567}], 'summary': 'Prompting the transformer with examples improves its evaluation, while directing it to think step by step elicits the desired behavior.', 'duration': 27.926, 'max_score': 1509.064, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1509064.jpg'}, {'end': 1641.633, 'src': 'embed', 'start': 1616.146, 'weight': 5, 'content': [{'end': 1620.909, 'text': 'But it turns out that, especially for the bigger models like GPT-4, you can just ask it did you meet the assignment?', 'start': 1616.146, 'duration': 4.763}, {'end': 1624.691, 'text': 'And actually, GPT-4 knows very well that it did not meet the assignment.', 'start': 1621.249, 'duration': 3.442}, {'end': 1626.892, 'text': 'It just kind of got unlucky in its sampling.', 'start': 1624.991, 'duration': 1.901}, {'end': 1629.394, 'text': "And so it will tell you, no, I didn't actually meet the assignment.", 'start': 1627.353, 'duration': 2.041}, {'end': 1630.355, 'text': 'Let me try again.', 'start': 1629.414, 'duration': 0.941}, {'end': 1637.806, 'text': "But without you prompting it, it doesn't know to revisit and so on.", 'start': 1631.095, 'duration': 6.711}, {'end': 1639.569, 'text': 'So you have to make up for that in your prompts.', 'start': 1638.047, 'duration': 1.522}, {'end': 1641.633, 'text': 'You have to get it to check.', 'start': 1640.11, 'duration': 1.523}], 'summary': 'Gpt-4, for bigger models, acknowledges failure without prompting, but needs direction to revisit and improve.', 'duration': 25.487, 'max_score': 1616.146, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1616146.jpg'}, {'end': 1702.503, 'src': 'embed', 'start': 1675.507, 'weight': 2, 'content': [{'end': 1681.833, 'text': 'And in Tree of Thought, the authors of this paper propose maintaining multiple completions for any given prompt.', 'start': 1675.507, 'duration': 6.326}, {'end': 1687.62, 'text': 'And then they are also scoring them along the way and keeping the ones that are going well, if that makes sense.', 'start': 1682.514, 'duration': 5.106}, {'end': 1699.002, 'text': 'And so a lot of people are like really playing around with kind of prompt engineering to basically bring back some of these abilities that we sort of have in our brain for LLMs.', 'start': 1688.24, 'duration': 10.762}, {'end': 1702.503, 'text': 'Now one thing I would like to note here is that this is not just a prompt.', 'start': 1699.942, 'duration': 2.561}], 'summary': 'Authors propose maintaining multiple completions for prompts to enhance llm abilities.', 'duration': 26.996, 'max_score': 1675.507, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1675507.jpg'}, {'end': 1750.42, 'src': 'embed', 'start': 1725.178, 'weight': 3, 'content': [{'end': 1731.98, 'text': 'AlphaGo has a policy for placing the next stone when it plays Go, and this policy was trained originally by imitating humans.', 'start': 1725.178, 'duration': 6.802}, {'end': 1737.261, 'text': 'But in addition to this policy, it also does multi-correlate research and, basically,', 'start': 1732.64, 'duration': 4.621}, {'end': 1741.642, 'text': 'it will play out a number of possibilities in its head and evaluate all of them, and only keep the ones that work well.', 'start': 1737.261, 'duration': 4.381}, {'end': 1747.004, 'text': 'And so I think this is kind of an equivalent of AlphaGo, but for text, if that makes sense.', 'start': 1742.003, 'duration': 5.001}, {'end': 1750.42, 'text': 'So, just like tree of thought,', 'start': 1748.939, 'duration': 1.481}], 'summary': 'Alphago evaluates and selects successful possibilities for moves, similar to a text-based equivalent.', 'duration': 25.242, 'max_score': 1725.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1725178.jpg'}, {'end': 1786.841, 'src': 'embed', 'start': 1762.126, 'weight': 4, 'content': [{'end': 1768.629, 'text': 'so on the right I have an example from this people could react where they structure the answer to a prompt.', 'start': 1762.126, 'duration': 6.503}, {'end': 1778.512, 'text': "As a sequence of thought action observation, thought action observation, and it's a full rollout kind of thinking process to answer the query,", 'start': 1769.116, 'duration': 9.396}, {'end': 1781.537, 'text': 'and in these actions the model is also allowed to tool use.', 'start': 1778.512, 'duration': 3.025}, {'end': 1786.841, 'text': 'ON THE LEFT I HAVE AN EXAMPLE OF AUTOGPT AND NOW AUTOGPT BY THE WAY.', 'start': 1782.478, 'duration': 4.363}], 'summary': 'Example of structured response sequence and autogpt usage.', 'duration': 24.715, 'max_score': 1762.126, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1762126.jpg'}], 'start': 1216.267, 'title': 'Gpt model and system 2 recreations', 'summary': 'Delves into the cognitive disparities between human reasoning and gpt models, highlighting the need for spreading reasoning across more tokens and using prompting techniques. it also explores recreating the slower, deliberate thinking process of system 2 in language models, with a focus on prompt engineering and python glue code.', 'chapters': [{'end': 1637.806, 'start': 1216.267, 'title': 'Gpt model and human cognitive differences', 'summary': 'Discusses the cognitive differences between human reasoning and gpt models, noting that gpt models lack the ability to reflect, sanity check, or correct mistakes, but have a vast fact-based knowledge and a large working memory capacity, which prompts the need to spread out reasoning across more tokens and use prompting techniques to elicit desired behavior.', 'duration': 421.539, 'highlights': ['GPT models lack the ability to reflect, sanity check, or correct mistakes', 'GPT models have a vast fact-based knowledge and a large working memory capacity', 'The need to spread out reasoning across more tokens and use prompting techniques']}, {'end': 1807.254, 'start': 1638.047, 'title': 'Recreating system 2 for llms', 'summary': 'Discusses techniques for improving language models by recreating the slower, deliberate thinking process of system 2, with examples from recent papers and parallels to alphago, emphasizing the use of prompt engineering and python glue code.', 'duration': 169.207, 'highlights': ['Tree of Thought proposes maintaining multiple completions for any given prompt and scoring them to keep the ones that work well.', "AlphaGo's policy for placing the next stone is trained by imitating humans and evaluates multiple possibilities, similar to a text equivalent.", 'Examples of prompt engineering involve structuring the answer as a sequence of thought-action-observation and using AutoGPT for recursive task breakdown.']}], 'duration': 590.987, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1216267.jpg', 'highlights': ['GPT models have a vast fact-based knowledge and a large working memory capacity', 'The need to spread out reasoning across more tokens and use prompting techniques', 'Tree of Thought proposes maintaining multiple completions for any given prompt and scoring them', "AlphaGo's policy for placing the next stone is trained by imitating humans and evaluates multiple possibilities", 'Examples of prompt engineering involve structuring the answer as a sequence of thought-action-observation and using AutoGPT for recursive task breakdown', 'GPT models lack the ability to reflect, sanity check, or correct mistakes']}, {'end': 2077.016, 'segs': [{'end': 1909.854, 'src': 'embed', 'start': 1881.016, 'weight': 0, 'content': [{'end': 1883.519, 'text': "AND SO IT'S KIND OF LIKE CONDITIONING ON GETTING A RIGHT ANSWER.", 'start': 1881.016, 'duration': 2.503}, {'end': 1889.366, 'text': "AND THIS ACTUALLY MAKES THE TRANSFORMER WORK BETTER, BECAUSE THE TRANSFORMER DOESN'T HAVE TO NOW HEDGE ITS PROBABILITY,", 'start': 1883.859, 'duration': 5.507}, {'end': 1892.369, 'text': 'MASS ON LOW QUALITY SOLUTIONS AS RIDICULOUS AS THAT SOUNDS.', 'start': 1889.366, 'duration': 3.003}, {'end': 1897.071, 'text': "And so basically don't feel free to ask for a strong solution.", 'start': 1892.99, 'duration': 4.081}, {'end': 1899.552, 'text': 'Say something like you are a leading expert on this topic.', 'start': 1897.391, 'duration': 2.161}, {'end': 1901.652, 'text': 'Pretend you have IQ 120 etc.', 'start': 1899.912, 'duration': 1.74}, {'end': 1909.854, 'text': "But don't try to ask for too much IQ, because if you ask for IQ like 400, you might be out of data distribution or, even worse,", 'start': 1902.212, 'duration': 7.642}], 'summary': 'Conditioning on a right answer improves transformer performance by avoiding hedging its probability mass on low-quality solutions.', 'duration': 28.838, 'max_score': 1881.016, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1881016.jpg'}, {'end': 1973.5, 'src': 'embed', 'start': 1947.426, 'weight': 1, 'content': [{'end': 1953.009, 'text': "one thing to keep in mind again is that these transformers by default may not know what they not don't know.", 'start': 1947.426, 'duration': 5.583}, {'end': 1955.39, 'text': 'so you may even want to tell the transformer in the prompt.', 'start': 1953.009, 'duration': 2.381}, {'end': 1961.153, 'text': 'you are not very good at mental arithmetic whenever you need to do very large number, addition, multiplication or whatever.', 'start': 1955.39, 'duration': 5.763}, {'end': 1962.634, 'text': 'instead, use this calculator.', 'start': 1961.153, 'duration': 1.481}, {'end': 1965.896, 'text': "here's how you use the calculator use this token, combination, etc.", 'start': 1962.634, 'duration': 3.262}, {'end': 1971.839, 'text': "etc. so you have to actually like, spell it out, because the model by default doesn't know what it's good at or not good at necessarily,", 'start': 1965.896, 'duration': 5.943}, {'end': 1973.5, 'text': 'just like you and I, you and I might be.', 'start': 1971.839, 'duration': 1.661}], 'summary': 'Transformers may need specific prompts to perform well, just like humans do.', 'duration': 26.074, 'max_score': 1947.426, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1947426.jpg'}, {'end': 2013.169, 'src': 'embed', 'start': 1986.257, 'weight': 2, 'content': [{'end': 1992.399, 'text': "But actually there's this entire space in between of these retrieval augmented models, and this works extremely well in practice.", 'start': 1986.257, 'duration': 6.142}, {'end': 1996.84, 'text': 'As I mentioned, the context window of a transformer is its working memory.', 'start': 1993.319, 'duration': 3.521}, {'end': 2001.402, 'text': 'If you can load the working memory with any information that is relevant to the task,', 'start': 1997.2, 'duration': 4.202}, {'end': 2005.563, 'text': 'the model will work extremely well because it can immediately access all that memory.', 'start': 2001.402, 'duration': 4.161}, {'end': 2013.169, 'text': 'And so I think a lot of people are really interested in, Basically, retrieval, augmented generation,', 'start': 2006.465, 'duration': 6.704}], 'summary': "Retrieval augmented models work well by loading relevant information into the model's working memory.", 'duration': 26.912, 'max_score': 1986.257, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1986257.jpg'}], 'start': 1807.254, 'title': 'Transformer model optimization', 'summary': 'Highlights the importance of conditioning llms to produce accurate results and discusses how transformers can benefit from tools such as calculators and retrieval augmented models to enhance their problem-solving capabilities.', 'chapters': [{'end': 1923.603, 'start': 1807.254, 'title': 'Optimizing llm performance', 'summary': "Highlights the importance of conditioning llms to produce accurate results by requesting strong solutions and avoiding excessive iq requests, as it leads to improved performance in transformer models, as evidenced by the study's experimentation with prompts and conditioning for right answers.", 'duration': 116.349, 'highlights': ["Conditioning LLMs to provide accurate results by requesting strong solutions and avoiding excessive IQ requests leads to improved performance in transformer models, as evidenced by the study's experimentation with prompts and conditioning for right answers.", 'Transformers are trained on language modeling and default to imitating low and high-quality solutions, making it necessary to request good performance at test time to ensure accurate results.', 'The study found that conditioning the transformer on obtaining the correct answer leads to better performance, as it eliminates the need for the transformer to allocate probability mass on low-quality solutions.']}, {'end': 2077.016, 'start': 1924.684, 'title': 'Transformers and memory in problem solving', 'summary': 'Discusses how transformers can benefit from tools such as calculators and retrieval augmented models to enhance their problem-solving capabilities, with a focus on utilizing working memory and referencing external documents for improved performance.', 'duration': 152.332, 'highlights': ['Transformers can benefit from tools like calculators and code interpreters to aid in computation, improving their problem-solving capabilities.', 'Retrieval augmented models can significantly enhance the performance of transformers by utilizing relevant working memory and referencing external data sources.', 'Utilizing a retrieval augmented generation approach, transformers can effectively access and utilize large amounts of external data for problem-solving tasks.']}], 'duration': 269.762, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A1807254.jpg', 'highlights': ['Conditioning LLMs on obtaining correct answers leads to better performance', 'Transformers can benefit from tools like calculators and code interpreters', 'Retrieval augmented models significantly enhance transformer performance', 'Avoiding excessive IQ requests leads to improved performance in transformer models']}, {'end': 2492.211, 'segs': [{'end': 2110.141, 'src': 'embed', 'start': 2077.016, 'weight': 1, 'content': [{'end': 2079.737, 'text': 'NEXT, I WANTED TO BRIEFLY TALK ABOUT CONSTRAINT PROMPTING.', 'start': 2077.016, 'duration': 2.721}, {'end': 2082.239, 'text': 'I ALSO FIND THIS VERY INTERESTING.', 'start': 2079.737, 'duration': 2.502}, {'end': 2090.326, 'text': 'THIS IS BASICALLY TECHNIQUES FOR Forcing a certain template in the outputs of LLMs.', 'start': 2082.239, 'duration': 8.087}, {'end': 2092.467, 'text': 'so guidance is one example from Microsoft.', 'start': 2090.326, 'duration': 2.141}, {'end': 2097.932, 'text': 'actually, and here we are enforcing that the output from the LLM will be JSON,', 'start': 2092.467, 'duration': 5.465}, {'end': 2101.635, 'text': 'and this will actually guarantee that the output will take on this form,', 'start': 2097.932, 'duration': 3.703}, {'end': 2110.141, 'text': 'because they go in and they mess with the probabilities of all the different tokens that come out of the transformer and they clamp those tokens and then the transformer is only filling in the blanks here,', 'start': 2101.635, 'duration': 8.506}], 'summary': "Constraint prompting enforces a specific output template in llms, like microsoft's json guidance, guaranteeing the desired output format.", 'duration': 33.125, 'max_score': 2077.016, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A2077016.jpg'}, {'end': 2165.512, 'src': 'embed', 'start': 2134.43, 'weight': 0, 'content': [{'end': 2137.351, 'text': 'It is becoming a lot more accessible to do this in practice,', 'start': 2134.43, 'duration': 2.921}, {'end': 2143.24, 'text': "and that's because of a number of techniques that have been developed and have libraries Very recently.", 'start': 2137.351, 'duration': 5.889}, {'end': 2147.762, 'text': "so, for example, parameter efficient, fine tuning techniques, like Laura, make sure that you're only trend.", 'start': 2143.24, 'duration': 4.522}, {'end': 2151.124, 'text': "you're only training small, sparse pieces of your model.", 'start': 2147.762, 'duration': 3.362}, {'end': 2156.027, 'text': 'so most of the model is kept clamped at the base model and some pieces of it are allowed to change,', 'start': 2151.124, 'duration': 4.903}, {'end': 2161.17, 'text': 'and the still works pretty well empirically and makes it much cheaper to sort of tune only small pieces of your model.', 'start': 2156.027, 'duration': 5.143}, {'end': 2165.512, 'text': "There's also it also means that because most of your model is clamped.", 'start': 2162.43, 'duration': 3.082}], 'summary': 'New techniques enable parameter-efficient fine tuning, reducing training costs for models.', 'duration': 31.082, 'max_score': 2134.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A2134430.jpg'}, {'end': 2215.44, 'src': 'embed', 'start': 2184.412, 'weight': 5, 'content': [{'end': 2185.353, 'text': 'SOMETHING TO KEEP IN MIND.', 'start': 2184.412, 'duration': 0.941}, {'end': 2189.717, 'text': 'IS THAT BASICALLY FINE TUNING IS A LOT MORE TECHNICALLY INVOLVED.', 'start': 2185.353, 'duration': 4.364}, {'end': 2190.678, 'text': 'IT REQUIRES A LOT MORE.', 'start': 2189.717, 'duration': 0.961}, {'end': 2192.841, 'text': 'I THINK TECHNICAL EXPERTISE TO DO RIGHT.', 'start': 2190.678, 'duration': 2.163}, {'end': 2196.965, 'text': 'IT REQUIRES HUMAN DATA CONTRACTORS FOR DATA SETS AND OR SYNTHETIC DATA PIPELINES.', 'start': 2192.841, 'duration': 4.124}, {'end': 2198.767, 'text': 'THAT CAN BE PRETTY COMPLICATED.', 'start': 2196.965, 'duration': 1.802}, {'end': 2203.733, 'text': 'THIS WILL DEFINITELY SLOW DOWN YOUR ITERATION CYCLE BY A LOT AND I WOULD SAY ON A HIGH LEVEL SFT.', 'start': 2198.767, 'duration': 4.966}, {'end': 2208.155, 'text': 'IS ACHIEVABLE, BECAUSE IT IS JUST YOUR CONTINUING LANGUAGE MODELING TASK.', 'start': 2204.393, 'duration': 3.762}, {'end': 2209.436, 'text': "IT'S RELATIVELY STRAIGHTFORWARD.", 'start': 2208.336, 'duration': 1.1}, {'end': 2215.44, 'text': 'BUT RLHF, I WOULD SAY, IS VERY MUCH RESEARCH TERRITORY AND IS EVEN MUCH HARDER TO GET TO WORK.', 'start': 2209.857, 'duration': 5.583}], 'summary': 'Fine tuning requires technical expertise and can slow down iteration cycle. sft achievable, rlhf is research territory.', 'duration': 31.028, 'max_score': 2184.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A2184412.jpg'}, {'end': 2391.072, 'src': 'embed', 'start': 2362.796, 'weight': 3, 'content': [{'end': 2366.417, 'text': 'and so I would keep that definitely in mind for all your applications.', 'start': 2362.796, 'duration': 3.621}, {'end': 2370.76, 'text': "models and this is, by the way, could be an entire talk, so I don't have time to cover it in full detail.", 'start': 2366.417, 'duration': 4.343}, {'end': 2371.98, 'text': 'models may be biased.', 'start': 2370.76, 'duration': 1.22}, {'end': 2374.041, 'text': 'let me fabricate hallucinate information.', 'start': 2371.98, 'duration': 2.061}, {'end': 2375.062, 'text': 'they may have reasoning errors.', 'start': 2374.041, 'duration': 1.021}, {'end': 2378.704, 'text': 'THEY MAY STRUGGLE IN ENTIRE CLASSES OF APPLICATIONS.', 'start': 2375.842, 'duration': 2.862}, {'end': 2381.626, 'text': 'THEY HAVE KNOWLEDGE CUTOFFS, SO THEY MIGHT NOT KNOW ANY INFORMATION.', 'start': 2378.704, 'duration': 2.922}, {'end': 2384.027, 'text': 'ABOVE SAY SEPTEMBER 2021.', 'start': 2381.626, 'duration': 2.401}, {'end': 2391.072, 'text': 'THEY ARE SUSCEPTIBLE TO A LARGE RANGE OF ATTACKS WHICH ARE SORT OF LIKE COMING OUT ON TWITTER, DAILY INCLUDING PROMPT INJECTION, JAILBREAK ATTACKS,', 'start': 2384.027, 'duration': 7.045}], 'summary': 'Models may be biased and susceptible to a range of attacks, affecting their knowledge and performance in various applications.', 'duration': 28.276, 'max_score': 2362.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A2362796.jpg'}, {'end': 2428.832, 'src': 'embed', 'start': 2400.458, 'weight': 2, 'content': [{'end': 2407.401, 'text': 'USE THEM AS A SOURCE OF INSPIRATION AND SUGGESTIONS and think co-pilots instead of completely autonomous agents that are just like performing a task somewhere.', 'start': 2400.458, 'duration': 6.943}, {'end': 2410.303, 'text': "It's just not clear that the models are there right now.", 'start': 2408.102, 'duration': 2.201}, {'end': 2415.025, 'text': 'So I wanted to close by saying that GPT-4 is an amazing artifact.', 'start': 2412.544, 'duration': 2.481}, {'end': 2416.346, 'text': "I'm very thankful that it exists.", 'start': 2415.205, 'duration': 1.141}, {'end': 2418.327, 'text': "And it's beautiful.", 'start': 2417.046, 'duration': 1.281}, {'end': 2420.248, 'text': 'It has a ton of knowledge across so many areas.', 'start': 2418.387, 'duration': 1.861}, {'end': 2422.629, 'text': 'It can do math, code, and so on.', 'start': 2420.468, 'duration': 2.161}, {'end': 2428.832, 'text': "And in addition, there's this thriving ecosystem of everything else that is being built and incorporated into the ecosystem.", 'start': 2423.25, 'duration': 5.582}], 'summary': 'Gpt-4 is an amazing artifact with broad capabilities and a thriving ecosystem.', 'duration': 28.374, 'max_score': 2400.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A2400458.jpg'}], 'start': 2077.016, 'title': 'Llms and gpt-4 utilization', 'summary': 'Discusses constraint prompting and fine-tuning in llms, emphasizing the need for top performance, as well as the importance of detailed prompts and prompt engineering techniques for effective utilization of gpt-4, highlighting the need for human oversight and suggesting low-stakes applications.', 'chapters': [{'end': 2247.294, 'start': 2077.016, 'title': 'Constraint prompting and fine-tuning in llms', 'summary': 'Discusses techniques like constraint prompting for enforcing specific outputs in llms, along with the technical involvement and challenges of fine-tuning models, emphasizing the importance of achieving top performance before optimization.', 'duration': 170.278, 'highlights': ["Constraint prompting techniques enforce specific outputs in LLMs, such as using guidance from Microsoft to ensure the output is in JSON format, guaranteeing the output's structure.", "Fine-tuning models involves changing the model's weights, and recent techniques like parameter-efficient methods make it more accessible, allowing training of small, sparse model pieces, leading to efficient computation and cheaper tuning.", 'Fine-tuning requires technical expertise, human data contractors, and complex data pipelines, significantly slowing down the iteration cycle, while RLHF implementation is challenging and not recommended for beginners.']}, {'end': 2492.211, 'start': 2247.714, 'title': 'Utilizing gpt-4 for prompt engineering', 'summary': 'Emphasizes the importance of detailed prompts and prompt engineering techniques to effectively utilize gpt-4, highlighting the need for human oversight due to limitations such as biases, reasoning errors, and susceptibility to attacks, and recommends using gpt-4 in low-stakes applications.', 'duration': 244.497, 'highlights': ['The chapter emphasizes the importance of detailed prompts and prompt engineering techniques to effectively utilize GPT-4.', 'Human oversight is recommended due to limitations such as biases, reasoning errors, and susceptibility to attacks.', 'Recommendation to use GPT-4 in low-stakes applications and as a source of inspiration, with human oversight and caution.']}], 'duration': 415.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bZQun8Y4L2A/pics/bZQun8Y4L2A2077016.jpg', 'highlights': ["Fine-tuning models involves changing the model's weights, and recent techniques like parameter-efficient methods make it more accessible, allowing training of small, sparse model pieces, leading to efficient computation and cheaper tuning.", "Constraint prompting techniques enforce specific outputs in LLMs, such as using guidance from Microsoft to ensure the output is in JSON format, guaranteeing the output's structure.", 'The chapter emphasizes the importance of detailed prompts and prompt engineering techniques to effectively utilize GPT-4.', 'Human oversight is recommended due to limitations such as biases, reasoning errors, and susceptibility to attacks.', 'Recommendation to use GPT-4 in low-stakes applications and as a source of inspiration, with human oversight and caution.', 'Fine-tuning requires technical expertise, human data contractors, and complex data pipelines, significantly slowing down the iteration cycle, while RLHF implementation is challenging and not recommended for beginners.']}], 'highlights': ["The LAMA model has 65 billion parameters and is trained on 1.4 trillion tokens, surpassing GPT-3's 175 billion parameters trained on 300 billion tokens.", 'The pre-training stage is the most computationally intensive, requiring internet-scale datasets, thousands of GPUs, and months of training.', 'RLHF models are preferred over base and SFT models, as shown in experiments where PPO models are RLHF and are preferred by humans in comparisons.', 'GPT models have a vast fact-based knowledge and a large working memory capacity', "Fine-tuning models involves changing the model's weights, and recent techniques like parameter-efficient methods make it more accessible, allowing training of small, sparse model pieces, leading to efficient computation and cheaper tuning."]}