title
AlphaZero from Scratch – Machine Learning Tutorial

description
In this machine learning course, you will learn how to build AlphaZero from scratch. AlphaZero is a game-playing algorithm that uses artificial intelligence and machine learning techniques to learn how to play board games at a superhuman level. 🔗 Trained Models + Code for each Chapter: https://github.com/foersterrobert/AlphaZeroFromScratch 🔗 AlphaZero-Paper: https://arxiv.org/pdf/1712.01815.pdf ✏️ Robert Förster created this course. Website: https://robertfoerster.com/ ⭐️ Contents ⭐️ ⌨️ (0:00:00) Introduction ⌨️ (0:01:35) Overview – Part 1 ⌨️ (0:05:43) MCTS-Explained ⌨️ (0:27:03) AlphaMCTS-Explained ⌨️ (0:39:05) Overview – Part 2 ⌨️ (0:45:14) Chapter 1: TicTacToe ⌨️ (1:00:32) Chapter 2: MCTS ⌨️ (1:34:54) Chapter 3: Model ⌨️ (2:03:09) Chapter 4: AlphaMCTS ⌨️ (2:16:39) Chapter 5: AlphaSelfPlay ⌨️ (2:35:13) Chapter 6: AlphaTrain ⌨️ (2:47:15) Chapter 7: AlphaTweaks ⌨️ (3:08:18) Chapter 8: ConnectFour ⌨️ (3:21:48) Chapter 9: AlphaParallel ⌨️ (3:55:59) Chapter 10: Eval 🎉 Thanks to our Champion and Sponsor supporters: 👾 Nattira Maneerat 👾 Heather Wcislo 👾 Serhiy Kalinets 👾 Erdeniz Unvan 👾 Justin Hual 👾 Agustín Kussrow 👾 Otis Morgan -- Learn to code for free and get a developer job: https://www.freecodecamp.org Read hundreds of articles on programming: https://freecodecamp.org/news

detail
{'title': 'AlphaZero from Scratch – Machine Learning Tutorial', 'heatmap': [{'end': 1785.584, 'start': 1636.121, 'weight': 0.861}, {'end': 2684.9, 'start': 2524.759, 'weight': 0.741}, {'end': 5804.363, 'start': 5354.472, 'weight': 0.828}, {'end': 7144.569, 'start': 6990.324, 'weight': 0.779}, {'end': 8181.447, 'start': 8024.318, 'weight': 0.883}, {'end': 9232.097, 'start': 9069.489, 'weight': 0.802}, {'end': 9966.305, 'start': 9817.709, 'weight': 1}, {'end': 11159.255, 'start': 11003.559, 'weight': 0.756}, {'end': 14871.727, 'start': 14718.443, 'weight': 0.802}], 'summary': "Tutorial 'alphazero from scratch – machine learning tutorial' covers creating alphazero from scratch using python and pytorch, exploring its superhuman performance in board games, discussing the monte carlo tree search algorithm, and providing practical examples and quantifiable data for tic-tac-toe and connect four games.", 'chapters': [{'end': 181.408, 'segs': [{'end': 41.652, 'src': 'embed', 'start': 18.186, 'weight': 0, 'content': [{'end': 25.388, 'text': 'In this video, we are going to rebuild AlphaZero completely from scratch using Python and the deep learning framework PyTorch.', 'start': 18.186, 'duration': 7.202}, {'end': 34.99, 'text': 'AlphaZero was initially developed by DeepMind and it is able to achieve magnificent performance in extremely complex board games such as Go,', 'start': 26.128, 'duration': 8.862}, {'end': 41.652, 'text': 'where the amount of legal board positions is actually significantly higher than the amount of atoms that are in our universe.', 'start': 34.99, 'duration': 6.662}], 'summary': 'Rebuilding alphazero using python and pytorch, achieves magnificent performance in complex board games.', 'duration': 23.466, 'max_score': 18.186, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB418186.jpg'}, {'end': 116.667, 'src': 'embed', 'start': 86.33, 'weight': 3, 'content': [{'end': 93.595, 'text': 'so that you will also understand how flexible alpha zero really is when it comes to adapting it to various domains.', 'start': 86.33, 'duration': 7.265}, {'end': 95.996, 'text': "so let's get started, Okay great.", 'start': 93.595, 'duration': 2.401}, {'end': 99.938, 'text': "So let's start with a brief overview of the AlphaZero algorithm.", 'start': 96.036, 'duration': 3.902}, {'end': 104.461, 'text': 'So first of all, it is important that we have two separate components.', 'start': 100.619, 'duration': 3.842}, {'end': 108.082, 'text': 'And on one hand, we have the self-play part right here.', 'start': 105.221, 'duration': 2.861}, {'end': 116.667, 'text': 'And during this phase, our AlphaZero model basically plays with itself in order to gather some information about the game.', 'start': 108.923, 'duration': 7.744}], 'summary': 'Alphazero is flexible in adapting to domains, using self-play to gather game information.', 'duration': 30.337, 'max_score': 86.33, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB486330.jpg'}], 'start': 0.169, 'title': 'Alphazero in pytorch', 'summary': 'Covers creating alphazero from scratch using python and pytorch, highlighting its superhuman performance in complex board games, its ability to learn by playing with itself, and its self-optimization cycle to outperform humans in specific games. it also explores the recent alphatensor paper by deepmind and provides an understanding of the alphazero algorithm, along with building, training, and evaluating it on tic-tac-toe and connect four games.', 'chapters': [{'end': 65.166, 'start': 0.169, 'title': 'Creating alphazero with pytorch', 'summary': 'Covers how to create alphazero from scratch using python and pytorch, highlighting its superhuman performance in complex board games like go, where legal board positions outnumber atoms in the universe, its ability to learn by playing with itself, and its versatility in playing chess and shogi.', 'duration': 64.997, 'highlights': ['AlphaZero achieves magnificent performance in extremely complex board games such as Go, where the amount of legal board positions is significantly higher than the amount of atoms in our universe.', 'The machine learning system of AlphaZero learns all information just by playing with itself.', 'The algorithm can play chess and shogi in a very impressive way.']}, {'end': 130.756, 'start': 65.166, 'title': 'Alphazero algorithm overview', 'summary': "Explores the recent alphatensor paper by deepmind, demonstrating alphazero's ability to invent novel algorithms in mathematics, and provides an understanding of the alphazero algorithm, along with building, training, and evaluating it on tic-tac-toe and connect four games.", 'duration': 65.59, 'highlights': ["The recent AlphaTensor paper by DeepMind showcased AlphaZero's capability to invent novel algorithms within mathematics.", 'The chapter provides an understanding of the AlphaZero algorithm and demonstrates its flexibility in adapting to various domains by building, training, and evaluating it on tic-tac-toe and connect four games.', 'During the self-play phase, the AlphaZero model plays with itself to gather information about the game and generate data for training.', 'The generated data from the self-play phase is utilized for training the AlphaZero model, highlighting its iterative learning process.']}, {'end': 181.408, 'start': 130.756, 'title': 'Alphazero self-optimization cycle', 'summary': 'Discusses the iterative process of alphazero model self-optimization through playing with itself, optimizing based on gained information, and repeating the cycle until reaching a neural network capable of outperforming humans in a specific game.', 'duration': 50.652, 'highlights': ['The AlphaZero model iteratively optimizes itself by playing with itself and using gained information, repeating the cycle n number of times to reach a neural network capable of outperforming humans.', 'The model aims to play a certain game better than any human by leveraging self-play, information optimization, and repetition of the cycle.']}], 'duration': 181.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB4169.jpg', 'highlights': ['AlphaZero achieves magnificent performance in extremely complex board games such as Go, where the amount of legal board positions is significantly higher than the amount of atoms in our universe.', 'The machine learning system of AlphaZero learns all information just by playing with itself.', 'The algorithm can play chess and shogi in a very impressive way.', "The recent AlphaTensor paper by DeepMind showcased AlphaZero's capability to invent novel algorithms within mathematics.", 'The generated data from the self-play phase is utilized for training the AlphaZero model, highlighting its iterative learning process.', 'The AlphaZero model iteratively optimizes itself by playing with itself and using gained information, repeating the cycle n number of times to reach a neural network capable of outperforming humans.']}, {'end': 1622.337, 'segs': [{'end': 1294.174, 'src': 'embed', 'start': 1265.598, 'weight': 0, 'content': [{'end': 1267.839, 'text': 'So first of all, we want to do selection again.', 'start': 1265.598, 'duration': 2.241}, {'end': 1271.26, 'text': 'So we want to walk down until we have reached the leaf node.', 'start': 1268.399, 'duration': 2.861}, {'end': 1274.379, 'text': 'And now our root node is fully expanded.', 'start': 1272.017, 'duration': 2.362}, {'end': 1279.683, 'text': 'So we have to calculate the UCB score for each of our children right here.', 'start': 1274.979, 'duration': 4.704}, {'end': 1284.367, 'text': 'And then we have to select the child that has the highest UCB score right?', 'start': 1280.364, 'duration': 4.003}, {'end': 1294.174, 'text': 'So when we calculate the UCB score for both of them, we can see that the UCB score here is higher because we have won the game in this case.', 'start': 1285.347, 'duration': 8.827}], 'summary': "Select child with highest ucb score after evaluating root node's children.", 'duration': 28.576, 'max_score': 1265.598, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB41265598.jpg'}, {'end': 1528.865, 'src': 'embed', 'start': 1490.291, 'weight': 1, 'content': [{'end': 1491.611, 'text': 'Sorry, wrong column.', 'start': 1490.291, 'duration': 1.32}, {'end': 1504.783, 'text': 'So, W four, and N4 here being our number of wins and our visit count.', 'start': 1493.372, 'duration': 11.411}, {'end': 1508.445, 'text': "And let's set them to zero like this.", 'start': 1506.724, 'duration': 1.721}, {'end': 1514.527, 'text': 'So now we also work in this direction right here.', 'start': 1512.126, 'duration': 2.401}, {'end': 1520.49, 'text': 'And again, after expansion, we can now move to simulation.', 'start': 1515.468, 'duration': 5.022}, {'end': 1525.372, 'text': 'So when we move here, the game is terminal.', 'start': 1521.03, 'duration': 4.342}, {'end': 1528.865, 'text': 'And in this case, we can just say that a draw has occurred.', 'start': 1526.004, 'duration': 2.861}], 'summary': 'Setting w=0, n4=0 for wins and visit count, moving to simulation, game terminal, resulting in a draw.', 'duration': 38.574, 'max_score': 1490.291, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB41490291.jpg'}, {'end': 1631.358, 'src': 'embed', 'start': 1603.268, 'weight': 4, 'content': [{'end': 1606.77, 'text': 'And you set this number manually at the beginning.', 'start': 1603.268, 'duration': 3.502}, {'end': 1610.18, 'text': 'And alternatively, you could also set a time.', 'start': 1607.777, 'duration': 2.403}, {'end': 1614.426, 'text': 'And during this time, you would just perform as many iterations as you can.', 'start': 1610.2, 'duration': 4.226}, {'end': 1618.552, 'text': 'But in this case, for example, you could just stop it for iterations.', 'start': 1615.287, 'duration': 3.265}, {'end': 1622.337, 'text': 'But in practice, you might run for thousands of iterations.', 'start': 1618.672, 'duration': 3.665}, {'end': 1631.358, 'text': 'Okay, so now we can also look at the way how our Monte Carlo tree search changes when we adapted to this general alpha zero algorithm.', 'start': 1623.271, 'duration': 8.087}], 'summary': 'The algorithm can run for thousands of iterations and adapt to the general alphazero algorithm.', 'duration': 28.09, 'max_score': 1603.268, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB41603268.jpg'}], 'start': 182.488, 'title': 'Neural network and mcts in tic-tac-toe', 'summary': 'Discusses neural network architecture for tic-tac-toe, emphasizing input, policy, and value output, and then explains the monte carlo tree search algorithm, highlighting selection, expansion, and backpropagation phases to maximize winning ratio and minimize visits, resulting in a total of five wins and seven visits at the root node.', 'chapters': [{'end': 333.344, 'start': 182.488, 'title': 'Neural network architecture for tic-tac-toe', 'summary': 'Discusses the architecture of a neural network for tic-tac-toe, outlining the input, policy, and value output, and their significance in determining optimal game moves and state evaluation.', 'duration': 150.856, 'highlights': ['The neural network architecture involves taking the game state as input and producing a policy and a value as output, where the policy determines the promising actions based on the state, and the value indicates the desirability of the state for the player (quantifiable: policy, value, state input).', 'The policy output of the neural network guides the player on where to make a move based on the state, while the value output measures the desirability of the state for the player, typically ranging from -1 to 1 (quantifiable: policy, value, state input).', 'The example illustrates how the neural network should provide a policy that guides the player to a winning move and a high value for a state where a win is achievable, demonstrating the effectiveness of the model (quantifiable: policy, value, state input).', 'The value output of the neural network aims to be close to positive one for optimal states, signifying their desirability for the player (quantifiable: value, optimal state).', "The chapter also hints at the upcoming discussion on self-play and training aspects of the model, indicating a thorough exploration of the neural network's functionality (quantifiable: self-play, training part)."]}, {'end': 559.288, 'start': 333.784, 'title': 'Monte carlo tree search', 'summary': 'Explains the monte carlo tree search algorithm, which involves creating a tree structure to determine the most promising action based on win ratios, visit counts, and future states.', 'duration': 225.504, 'highlights': ['The Monte Carlo Tree Search algorithm involves creating a tree structure where each node stores state, winning count (w), and visit count (n), enabling the determination of the most promising action based on win ratios and visit counts.', 'The algorithm calculates the winning ratio for each node based on the total number of wins achieved and the total visit count, allowing the selection of the most promising action for future moves.', 'By analyzing the children of the root node and calculating winning ratios, the algorithm identifies the action that looks the most promising, guiding decision-making in games like tic-tac-toe.']}, {'end': 815.324, 'start': 560.362, 'title': 'Monte carlo tree search', 'summary': 'Explains the monte carlo tree search algorithm, emphasizing the selection phase, where nodes are expanded based on the ucb formula to maximize winning ratio and minimize visits, and the subsequent expansion phase, which adds new nodes to the tree with initial winning and visit counts of zero.', 'duration': 254.962, 'highlights': ['The selection phase involves walking down the tree by choosing the child with the highest UCB formula, maximizing winning ratio and minimizing visits. The direction chosen for walking down the tree is determined by selecting the child with the highest UCB formula, which reflects a balance between maximizing winning ratio and minimizing visits, guiding the selection phase.', 'Nodes are expanded based on the UCB formula to maximize winning ratio and minimize visits. Expansion involves creating new nodes in the tree, guided by the UCB formula to optimize winning ratio and minimize visits for efficient exploration.', 'Newly created nodes in the expansion phase have initial winning and visit counts of zero. Upon expansion, newly created nodes are initialized with zero winning count and zero visit count, as they have not been explored or visited prior to this phase.']}, {'end': 1263.658, 'start': 816.821, 'title': 'Mcts algorithm: iteration and backpropagation', 'summary': 'Describes the mcts process, including the expansion, simulation, and backpropagation phases, with a focus on updating win and visit counts during backpropagation, resulting in a total number of wins of five and a general visit count of seven at the root node.', 'duration': 446.837, 'highlights': ['The chapter describes the MCTS process, including the expansion, simulation, and backpropagation phases. It explains the different phases involved in the Monte Carlo Tree Search (MCTS) algorithm.', 'Updating win and visit counts during backpropagation results in a total number of wins of five and a general visit count of seven at the root node. During backpropagation, the total number of wins is updated to five and the general visit count to seven at the root node.', 'The chapter explains the process of simulating random actions to play into the future until reaching a terminal node. It details the simulation phase involving playing randomly into the future until a terminal node is reached.']}, {'end': 1622.337, 'start': 1265.598, 'title': 'Monte carlo tree search example', 'summary': 'Discusses the monte carlo tree search process, where it iteratively selects, expands, simulates, and back propagates to determine the best action, with an example showing a total of 4 visits and 1.5 wins at the root node.', 'duration': 356.739, 'highlights': ['The total number of visits at the root node is 4, with a total number of wins of 1.5, demonstrating the results of the Monte Carlo tree search process.', 'The process involves iterative selection, expansion, simulation, and back propagation to determine the best action.', 'The UCB score is calculated for each child node to select the one with the highest score, guiding the Monte Carlo tree search process.']}], 'duration': 1439.849, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB4182488.jpg', 'highlights': ['The neural network architecture involves taking the game state as input and producing a policy and a value as output, where the policy determines the promising actions based on the state, and the value indicates the desirability of the state for the player.', 'The Monte Carlo Tree Search algorithm involves creating a tree structure where each node stores state, winning count (w), and visit count (n), enabling the determination of the most promising action based on win ratios and visit counts.', 'The selection phase involves walking down the tree by choosing the child with the highest UCB formula, maximizing winning ratio and minimizing visits.', 'The chapter describes the MCTS process, including the expansion, simulation, and backpropagation phases.', 'The total number of visits at the root node is 4, with a total number of wins of 1.5, demonstrating the results of the Monte Carlo tree search process.']}, {'end': 2763.909, 'segs': [{'end': 1785.584, 'src': 'heatmap', 'start': 1636.121, 'weight': 0.861, 'content': [{'end': 1644.948, 'text': 'So the first thing is that we also want to incorporate the policy that was gained from our model into the search process right?', 'start': 1636.121, 'duration': 8.827}, {'end': 1649.192, 'text': 'And especially, we want to add it to the selection phase right here.', 'start': 1645.789, 'duration': 3.403}, {'end': 1659.544, 'text': 'where you see this p of i part right here.', 'start': 1655.6, 'duration': 3.944}, {'end': 1667.632, 'text': 'so basically, when we select the child and we want to take the yeah children with the highest ucd formula,', 'start': 1659.544, 'duration': 8.088}, {'end': 1674.023, 'text': 'then we will also that was assigned to it from its parent perspective.', 'start': 1667.632, 'duration': 6.391}, {'end': 1680.926, 'text': 'so remember that the policy is just this distribution of likelihoods and basically for each child.', 'start': 1674.023, 'duration': 6.903}, {'end': 1691.149, 'text': 'when we expand it, we will also store this policy likelihood at the given position here for the node as well,', 'start': 1680.926, 'duration': 10.223}, {'end': 1698.447, 'text': 'and because of that we then also tend to select children more often that were assigned a high policy by its parent right?', 'start': 1691.149, 'duration': 7.298}, {'end': 1705.269, 'text': 'So this way our model can guide us through the selection phase inside of our Monte Carlo tree search.', 'start': 1699.028, 'duration': 6.241}, {'end': 1707.83, 'text': 'So this is the first key change here.', 'start': 1705.95, 'duration': 1.88}, {'end': 1711.291, 'text': 'So just generally this updated UCB formula right here.', 'start': 1708.29, 'duration': 3.001}, {'end': 1723.249, 'text': 'information of the value that we got from our neural network, and we can use this information by, first of all,', 'start': 1716.203, 'duration': 7.046}, {'end': 1728.233, 'text': 'completely getting rid of this simulation phase right here.', 'start': 1723.249, 'duration': 4.984}, {'end': 1736.34, 'text': "So we don't want to do these random rollouts into the future anymore until we have reached a terminal state,", 'start': 1728.974, 'duration': 7.366}, {'end': 1743.754, 'text': 'but rather we will just use the value network when it basically evaluated a certain state.', 'start': 1736.34, 'duration': 7.414}, {'end': 1746.955, 'text': 'And this value will then be used for backpropagation.', 'start': 1744.334, 'duration': 2.621}, {'end': 1757.017, 'text': 'So this way we use both the policy for our selection and the value for our backpropagation of our neural network.', 'start': 1748.495, 'duration': 8.522}, {'end': 1766.1, 'text': 'And because of that we know that our Monte Carlo tree search will improve drastically when we also have a model that understands how to play the game.', 'start': 1757.498, 'duration': 8.602}, {'end': 1775.536, 'text': 'This way, at the end, we then have a better search with a better model that can even create a much better model.', 'start': 1767.25, 'duration': 8.286}, {'end': 1778.559, 'text': 'So we can keep the cycle up as well.', 'start': 1775.556, 'duration': 3.003}, {'end': 1785.584, 'text': "So there's also just a small change here as well, just a minor one.", 'start': 1779.74, 'duration': 5.844}], 'summary': 'Incorporating policy from model improves monte carlo tree search, guiding selection and backpropagation for better game play.', 'duration': 149.463, 'max_score': 1636.121, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB41636121.jpg'}, {'end': 2117.35, 'src': 'embed', 'start': 2088.821, 'weight': 6, 'content': [{'end': 2091.484, 'text': 'because we have a visit count that is equal to zero.', 'start': 2088.821, 'duration': 2.663}, {'end': 2096.59, 'text': 'So currently the way this is implemented, we would get a division by zero error.', 'start': 2091.524, 'duration': 5.066}, {'end': 2104.667, 'text': 'So here we have to make sure that we will only check for the winning probability if we have a visit count that is larger than zero.', 'start': 2097.311, 'duration': 7.356}, {'end': 2109.468, 'text': 'So because of that, I will just set that in brackets right here.', 'start': 2105.307, 'duration': 4.161}, {'end': 2114.91, 'text': 'So we will only check for this if we have a visit count that is larger than zero.', 'start': 2109.928, 'duration': 4.982}, {'end': 2117.35, 'text': 'And in other cases, we will just mask it out.', 'start': 2114.95, 'duration': 2.4}], 'summary': 'Implement check for winning probability only if visit count > 0', 'duration': 28.529, 'max_score': 2088.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB42088821.jpg'}, {'end': 2280.574, 'src': 'embed', 'start': 2219.056, 'weight': 5, 'content': [{'end': 2225.797, 'text': 'So now, first of all, we can also add this information here to our notes again.', 'start': 2219.056, 'duration': 6.741}, {'end': 2230.698, 'text': "So let's first of all set w four equal to zero.", 'start': 2226.358, 'duration': 4.34}, {'end': 2233.999, 'text': 'And we can say n four equal to zero.', 'start': 2231.279, 'duration': 2.72}, {'end': 2243.781, 'text': 'And then policy probability four should be equal to 0.5.', 'start': 2235.079, 'duration': 8.702}, {'end': 2248.322, 'text': 'And then here we also have w three, which should just be zero.', 'start': 2243.781, 'duration': 4.541}, {'end': 2253.281, 'text': 'And we have N3, which should also be zero.', 'start': 2250.54, 'duration': 2.741}, {'end': 2259.325, 'text': 'And then we have also our policy probability three here, and this should be equal to 0.5.', 'start': 2254.282, 'duration': 5.043}, {'end': 2261.386, 'text': 'I hope you can read that here.', 'start': 2259.325, 'duration': 2.061}, {'end': 2273.072, 'text': 'Okay, so now we have expanded these two children right here and created them and stored the policy information here and served them.', 'start': 2263.067, 'duration': 10.005}, {'end': 2276.172, 'text': 'so now we want to back propagate again.', 'start': 2273.951, 'duration': 2.221}, {'end': 2280.574, 'text': 'so here we will just use this value of 0.1.', 'start': 2276.172, 'duration': 4.402}], 'summary': 'Initialize w4 and n4 to 0, set policy probability to 0.5, propagate with 0.1.', 'duration': 61.518, 'max_score': 2219.056, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB42219056.jpg'}, {'end': 2684.9, 'src': 'heatmap', 'start': 2511.993, 'weight': 3, 'content': [{'end': 2519.476, 'text': 'The reward is equal to the final outcome for the player that we are on this given state.', 'start': 2511.993, 'duration': 7.483}, {'end': 2523.959, 'text': 'So basically in this example right here, X won the game.', 'start': 2520.336, 'duration': 3.623}, {'end': 2533.285, 'text': 'So that means that for all states when we were player X, we want the final reward or final outcome be equal to one.', 'start': 2524.759, 'duration': 8.526}, {'end': 2541.471, 'text': 'And in all cases when we were player O, we want the reward to be negative one because we lost the game.', 'start': 2533.986, 'duration': 7.485}, {'end': 2546.155, 'text': 'So, basically, that means that you know what we want.', 'start': 2542.252, 'duration': 3.903}, {'end': 2548.036, 'text': 'this game is play x right here.', 'start': 2546.155, 'duration': 1.881}, {'end': 2556.24, 'text': 'so we might just guess that this state also is quite promising, because, yeah, this state led us to win the game eventually.', 'start': 2548.036, 'duration': 8.204}, {'end': 2561.583, 'text': 'so this is why we turn change this reward to positive one here when we are playing x,', 'start': 2556.24, 'duration': 5.343}, {'end': 2568.607, 'text': 'and this is also the reason why we change the reward to negative one when we are player player over here.', 'start': 2561.583, 'duration': 7.024}, {'end': 2578.197, 'text': 'so these combinations of the state, the mcts distribution and the reward will then be stored as tuples to our training data.', 'start': 2568.607, 'duration': 9.59}, {'end': 2585.25, 'text': 'And then we can later use these for training in order to improve our model.', 'start': 2578.838, 'duration': 6.412}, {'end': 2591.397, 'text': 'So this is great, but now we have to understand how training works.', 'start': 2587.156, 'duration': 4.241}, {'end': 2594.118, 'text': "So let's look at this right here.", 'start': 2592.138, 'duration': 1.98}, {'end': 2599.54, 'text': 'So at the beginning, we just take a sample from our training data.', 'start': 2595.319, 'duration': 4.221}, {'end': 2609.604, 'text': 'And you should know now that the sample is the state, the MCTS distribution, and pi, and the reward z right here.', 'start': 2600.361, 'duration': 9.243}, {'end': 2615.609, 'text': 'Then we will use the state s as the input for our model.', 'start': 2610.564, 'duration': 5.045}, {'end': 2621.214, 'text': 'Then we will get this policy and this value out as a return.', 'start': 2616.489, 'duration': 4.725}, {'end': 2634.305, 'text': 'And now for training, the next step basically is to minimize the difference between the policy p and the MCTS distribution at the given state pi.', 'start': 2623.015, 'duration': 11.29}, {'end': 2640.754, 'text': 'on one hand, and then we also want to minimize the difference of our value.', 'start': 2635.769, 'duration': 4.985}, {'end': 2648.621, 'text': 'v here and the final reward or final outcome z we sampled from our training data.', 'start': 2640.754, 'duration': 7.867}, {'end': 2661.432, 'text': 'And the way we can minimize the difference basically in a loss is, first of all, by having a mean squared error between the reward and the value here.', 'start': 2649.301, 'duration': 12.131}, {'end': 2671.536, 'text': 'And then by also having this multi-target cross entropy loss between our MCTS distribution Pi and our policy P right here.', 'start': 2662.233, 'duration': 9.303}, {'end': 2676.617, 'text': 'Then we also have some form of a true regularization at the end, but yeah.', 'start': 2672.336, 'duration': 4.281}, {'end': 2684.9, 'text': 'So essentially we want to have this loss right here, and then we want to minimize the loss by back propagation.', 'start': 2676.877, 'duration': 8.023}], 'summary': 'Training data includes state, mcts distribution, and reward values to improve the model through minimizing differences in policy and value using mean squared error and cross entropy loss.', 'duration': 29.478, 'max_score': 2511.993, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB42511993.jpg'}, {'end': 2726.319, 'src': 'embed', 'start': 2699.264, 'weight': 2, 'content': [{'end': 2708.186, 'text': 'Then we can use this optimized model to again play with itself in order to gain more information, in order to train again, and so on.', 'start': 2699.264, 'duration': 8.922}, {'end': 2710.986, 'text': 'So this is how AlphaZero is structured.', 'start': 2708.926, 'duration': 2.06}, {'end': 2713.707, 'text': 'And now we can actually get to coding.', 'start': 2711.606, 'duration': 2.101}, {'end': 2717.57, 'text': "So let's actually start by programming AlphaZero.", 'start': 2714.847, 'duration': 2.723}, {'end': 2726.319, 'text': "So first of all, we're going to build everything inside of a Jupyter notebook, since the interactivity might be nice for understanding the algorithm.", 'start': 2717.91, 'duration': 8.409}], 'summary': "Alphazero's structured approach involves playing and training with an optimized model, and the algorithm is implemented within a jupyter notebook for enhanced interactivity.", 'duration': 27.055, 'max_score': 2699.264, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB42699264.jpg'}, {'end': 2773.841, 'src': 'embed', 'start': 2742.314, 'weight': 0, 'content': [{'end': 2745.216, 'text': 'And then we will build a Monte Carlo tree search around it.', 'start': 2742.314, 'duration': 2.902}, {'end': 2754.202, 'text': 'And after we have gone so far, we will eventually build AlphaZero on top of the Monte Carlo tree search we had previously.', 'start': 2746.036, 'duration': 8.166}, {'end': 2759.526, 'text': 'And then we will expand our portfolio to Connect4 as well.', 'start': 2755.183, 'duration': 4.343}, {'end': 2763.909, 'text': 'And not only should this be easier to understand,', 'start': 2760.506, 'duration': 3.403}, {'end': 2773.841, 'text': 'but it should also show how flexible FS0 really is when it comes to solving different environments or board games in this case.', 'start': 2765.336, 'duration': 8.505}], 'summary': "Building monte carlo tree search, then alphazero, and expanding to connect4 demonstrates fs0's flexibility and adaptability.", 'duration': 31.527, 'max_score': 2742.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB42742314.jpg'}], 'start': 1623.271, 'title': 'Monte carlo tree search and alpha zero algorithm', 'summary': "Explores the adaptation of monte carlo tree search to the general alpha zero algorithm, emphasizing the policy incorporation into the search process, resulting in more frequent selection of high policy children. it also discusses the updated ucb formula, highlighting the usage of value network for backpropagation and policy for selection, leading to a drastic improvement in monte carlo tree search with specific examples and probabilities provided. additionally, the chapter explains the adaptation of the monte carlo tree search algorithm in alphazero, emphasizing self-play and training process to improve the model's game-playing ability.", 'chapters': [{'end': 1707.83, 'start': 1623.271, 'title': 'Monte carlo tree search with alpha zero algorithm', 'summary': 'Explores how the monte carlo tree search is adapted to the general alpha zero algorithm, emphasizing the incorporation of the policy gained from the model into the search process, leading to more frequent selection of children with high policy assignments.', 'duration': 84.559, 'highlights': ['The incorporation of the policy gained from the model into the search process, leading to more frequent selection of children with high policy assignments.', 'The adaptation of Monte Carlo tree search to the general alpha zero algorithm, involving two key changes.']}, {'end': 2054.261, 'start': 1708.29, 'title': 'Improving monte carlo tree search', 'summary': 'Discusses the updated ucb formula, emphasizing the usage of the value network for backpropagation and policy for selection, resulting in a drastic improvement in monte carlo tree search, with specific examples and probabilities provided for better understanding.', 'duration': 345.971, 'highlights': ["The value network is used for backpropagation, resulting in a drastic improvement in Monte Carlo tree search, and the model's ability to play the game and create a better model is emphasized.", 'The policy probabilities obtained from the neural network enable convenient expansion in all possible directions during the expansion phase, allowing for the creation of multiple nodes instead of just one.', 'Detailed examples are provided, demonstrating the process of creating new nodes, assigning policy probabilities, and back propagating values, with specific probabilities and values mentioned for better comprehension.']}, {'end': 2763.909, 'start': 2056.425, 'title': 'Alphazero: monte carlo tree search and self-play', 'summary': "Explains the monte carlo tree search algorithm and its adaptation in alphazero, emphasizing self-play and training process to improve the model's game-playing ability.", 'duration': 707.484, 'highlights': ["During self-play, a Monte Carlo tree search is performed, and the sampled actions and resulting game states are used to generate training data. The Monte Carlo tree search is used for self-play, and the sampled actions and resulting game states are stored as training data to improve the model's game-playing ability.", "In training, the model's policy and value are optimized by minimizing the difference between the MCTS distribution, value, and reward using loss functions and backpropagation. The model's policy and value are optimized in training by minimizing the difference between the MCTS distribution, value, and reward using loss functions and backpropagation, resulting in updated model weights and improved game-playing ability.", 'The chapter also outlines the intention to program AlphaZero within a Jupyter notebook, starting with a simple tic-tac-toe game and expanding to Connect4. The chapter outlines the plan to program AlphaZero within a Jupyter notebook, starting with a simple tic-tac-toe game and expanding to Connect4 to facilitate understanding and efficient training using external resources.']}], 'duration': 1140.638, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB41623271.jpg', 'highlights': ['The adaptation of Monte Carlo tree search to the general alpha zero algorithm, involving two key changes.', 'The incorporation of the policy gained from the model into the search process, leading to more frequent selection of children with high policy assignments.', "The value network is used for backpropagation, resulting in a drastic improvement in Monte Carlo tree search, and the model's ability to play the game and create a better model is emphasized.", 'The policy probabilities obtained from the neural network enable convenient expansion in all possible directions during the expansion phase, allowing for the creation of multiple nodes instead of just one.', 'During self-play, a Monte Carlo tree search is performed, and the sampled actions and resulting game states are used to generate training data.', "In training, the model's policy and value are optimized by minimizing the difference between the MCTS distribution, value, and reward using loss functions and backpropagation, resulting in updated model weights and improved game-playing ability.", 'The chapter outlines the plan to program AlphaZero within a Jupyter notebook, starting with a simple tic-tac-toe game and expanding to Connect4 to facilitate understanding and efficient training using external resources.', 'Detailed examples are provided, demonstrating the process of creating new nodes, assigning policy probabilities, and back propagating values, with specific probabilities and values mentioned for better comprehension.']}, {'end': 3631.211, 'segs': [{'end': 3365.484, 'src': 'embed', 'start': 3335.364, 'weight': 1, 'content': [{'end': 3337.105, 'text': "So let's just write a method, get opponent.", 'start': 3335.364, 'duration': 1.741}, {'end': 3345.667, 'text': "And yeah, take a player's input, and then we'll just return the negative player.", 'start': 3339.125, 'duration': 6.542}, {'end': 3349.007, 'text': 'So if our initial player would be negative one, we would return one.', 'start': 3345.727, 'duration': 3.28}, {'end': 3352.188, 'text': 'And if our initial player would be one, then we would just return negative one.', 'start': 3349.067, 'duration': 3.121}, {'end': 3353.688, 'text': "So that's great.", 'start': 3352.968, 'duration': 0.72}, {'end': 3357.269, 'text': 'And now we can test our game we built right here.', 'start': 3354.408, 'duration': 2.861}, {'end': 3361.83, 'text': 'So I just say tick-tack-toe equals tick-tack-toe.', 'start': 3357.309, 'duration': 4.521}, {'end': 3365.484, 'text': 'Then I just say player equals 1.', 'start': 3363.14, 'duration': 2.344}], 'summary': 'A method was created to get the opponent, returning the negative of the player input, allowing for testing of the tic-tac-toe game.', 'duration': 30.12, 'max_score': 3335.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43335364.jpg'}, {'end': 3541.661, 'src': 'embed', 'start': 3497.722, 'weight': 3, 'content': [{'end': 3505.488, 'text': 'so we create a new get a new state and the new state will be created by calling tic-tac-toe dot, get next state.', 'start': 3497.722, 'duration': 7.766}, {'end': 3513.134, 'text': 'And we want to give the old state the action that the players input.', 'start': 3508.351, 'duration': 4.783}, {'end': 3523.679, 'text': 'Great So then we can check if the game has been terminated.', 'start': 3518.276, 'duration': 5.403}, {'end': 3535.165, 'text': "So we'll just say value is terminal equals tic tac toe dot get value and terminated.", 'start': 3524.439, 'duration': 10.726}, {'end': 3541.661, 'text': 'And then we want to give the state and the actions input.', 'start': 3537.939, 'duration': 3.722}], 'summary': 'Create new game state, check termination, and get value.', 'duration': 43.939, 'max_score': 3497.722, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43497722.jpg'}, {'end': 3631.211, 'src': 'embed', 'start': 3582.5, 'weight': 0, 'content': [{'end': 3586.342, 'text': 'And in all other cases, if the game continues, we also want to flip the player.', 'start': 3582.5, 'duration': 3.842}, {'end': 3594.428, 'text': 'So we just say player equals tic tac toe dot get opponent of player.', 'start': 3587.463, 'duration': 6.965}, {'end': 3597.926, 'text': 'so nice, this should be working.', 'start': 3596.286, 'duration': 1.64}, {'end': 3599.507, 'text': "so let's test this out.", 'start': 3597.926, 'duration': 1.581}, {'end': 3601.367, 'text': 'so here are the valid moves.', 'start': 3599.507, 'duration': 1.86}, {'end': 3602.907, 'text': "look, it's looking nice.", 'start': 3601.367, 'duration': 1.54}, {'end': 3604.788, 'text': "so let's just pick zero for starters.", 'start': 3602.907, 'duration': 1.881}, {'end': 3607.048, 'text': "let's see nice.", 'start': 3604.788, 'duration': 2.26}, {'end': 3608.629, 'text': 'okay, we played here.', 'start': 3607.048, 'duration': 1.581}, {'end': 3610.549, 'text': 'so we are player negative one now.', 'start': 3608.629, 'duration': 1.92}, {'end': 3611.849, 'text': "so let's just play four.", 'start': 3610.549, 'duration': 1.3}, {'end': 3613.31, 'text': "let's play zero.", 'start': 3611.849, 'duration': 1.461}, {'end': 3616.15, 'text': 'we can just say eight, play negative one one.', 'start': 3613.31, 'duration': 2.84}, {'end': 3618.051, 'text': 'for example, play zero.', 'start': 3616.15, 'duration': 1.901}, {'end': 3619.991, 'text': "let's say two and play negative one.", 'start': 3618.051, 'duration': 1.94}, {'end': 3621.251, 'text': 'you can just say seven.', 'start': 3619.991, 'duration': 1.26}, {'end': 3622.832, 'text': 'and nice, we see here.', 'start': 3621.251, 'duration': 1.581}, {'end': 3626.91, 'text': 'we got three negative ones inside of this column right here.', 'start': 3622.832, 'duration': 4.078}, {'end': 3631.211, 'text': 'And thus we get the result that player negative one has won.', 'start': 3627.39, 'duration': 3.821}], 'summary': 'During the game, player -1 wins with 3 negative ones in a column.', 'duration': 48.711, 'max_score': 3582.5, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43582500.jpg'}], 'start': 2765.336, 'title': 'Tic-tac-toe game with numpy', 'summary': 'Demonstrates creating a flexible tic-tac-toe game with numpy, explaining winning conditions, and covering game implementation, providing a comprehensive guide to creating and implementing a tic-tac-toe game using numpy.', 'chapters': [{'end': 3022.361, 'start': 2765.336, 'title': 'Creating tic-tac-toe game with numpy', 'summary': 'Demonstrates how to create a flexible tic-tac-toe game using numpy, defining the game board, actions, and legal moves, with a focus on code implementation and logic.', 'duration': 257.025, 'highlights': ['The chapter walks through the creation of a tic-tac-toe game using NumPy, including defining the game board dimensions and action size.', "The method to get the initial state is introduced, which initializes the game board with zeros, utilizing NumPy's np.zeros function.", 'The process of encoding actions into rows and columns for the game board using integer division and modulo operations is explained, enabling efficient player moves on the board.', 'The method for determining valid moves based on the state of the game board is detailed, opting for flattened arrays and utilizing np.un8 for efficient processing.', 'The implementation of a method to check for a win by a player after their move is outlined, contributing to the overall game logic and functionality.']}, {'end': 3234.335, 'start': 3024.482, 'title': 'Tic-tac-toe winning conditions', 'summary': 'Explains the method to check for the four winning conditions in tic-tac-toe, including three in a row, three in a column, and both the diagonals, using np.sum and np.diag, in order to return true if any condition is met and false if none are met.', 'duration': 209.853, 'highlights': ['Explaining the four winning conditions in tic-tac-toe Details the four ways to win a game in tic-tac-toe: three in a row, three in a column, and both the diagonals, with the method to check each condition.', 'Using np.sum to check for three in a row and three in a column Demonstrates the use of np.sum to check if there are three in a row or three in a column, by summing the state of the given row/column and checking if it equals player times the columnCount or rowCount.', 'Utilizing np.diag method to check for both diagonals Illustrates the use of np.diag method to check for both diagonals, by summing the np.diag of the state and checking if it equals player times self.rowCount or self.columnCount, and flipping the state to check the opposite diagonal.']}, {'end': 3631.211, 'start': 3234.335, 'title': 'Tic-tac-toe game implementation', 'summary': 'Covers the implementation of a working tic-tac-toe game, including methods for checking game termination, determining the opponent, and testing the game functionality.', 'duration': 396.876, 'highlights': ['Implemented methods for checking game termination, determining the opponent, and testing game functionality The chapter covers the implementation of a working tic-tac-toe game, including methods for checking game termination, determining the opponent, and testing the game functionality.', 'Defined method to change the player by returning the negative player A method called get opponent was defined to change the player by returning the negative player, aiding in the implementation of different board games.', 'Checked for game termination and determined the winner or draw based on the game state and actions taken The process involved checking for game termination and determining the winner or draw based on the game state and actions taken, with examples of printing the state and player status.']}], 'duration': 865.875, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB42765336.jpg', 'highlights': ['The chapter covers the implementation of a working tic-tac-toe game, including methods for checking game termination, determining the opponent, and testing the game functionality.', 'Explaining the four winning conditions in tic-tac-toe Details the four ways to win a game in tic-tac-toe: three in a row, three in a column, and both the diagonals, with the method to check each condition.', 'The method for determining valid moves based on the state of the game board is detailed, opting for flattened arrays and utilizing np.un8 for efficient processing.', "The method to get the initial state is introduced, which initializes the game board with zeros, utilizing NumPy's np.zeros function.", 'The implementation of a method to check for a win by a player after their move is outlined, contributing to the overall game logic and functionality.']}, {'end': 4678.008, 'segs': [{'end': 3658.835, 'src': 'embed', 'start': 3632.191, 'weight': 1, 'content': [{'end': 3638.213, 'text': 'Perfect So now, since we have got our game of tic-tac-toe ready, we can actually build the Monte Carlo tree search around it.', 'start': 3632.191, 'duration': 6.022}, {'end': 3641.274, 'text': "So let's just create a new cell right here.", 'start': 3639.013, 'duration': 2.261}, {'end': 3645.315, 'text': 'And then we want to have a class for our Monte Carlo tree search.', 'start': 3642.294, 'duration': 3.021}, {'end': 3647.916, 'text': "So I'm just going to call that MCTS for now.", 'start': 3645.515, 'duration': 2.401}, {'end': 3651.157, 'text': 'And then we have our init here.', 'start': 3649.296, 'duration': 1.861}, {'end': 3658.835, 'text': 'And we want to pass on a game, so tic-tac-toe in this case, and then also some arguments.', 'start': 3652.731, 'duration': 6.104}], 'summary': 'Building monte carlo tree search around tic-tac-toe game', 'duration': 26.644, 'max_score': 3632.191, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43632191.jpg'}, {'end': 3771.169, 'src': 'embed', 'start': 3738.474, 'weight': 0, 'content': [{'end': 3741.536, 'text': "So let's just say return visit counts.", 'start': 3738.474, 'duration': 3.062}, {'end': 3743.858, 'text': 'Yeah, at the end.', 'start': 3743.117, 'duration': 0.741}, {'end': 3748.001, 'text': "Yeah, so that's the structure we have inside of our multicolored tree search.", 'start': 3744.838, 'duration': 3.163}, {'end': 3753.683, 'text': 'And next we can actually define a class for a node as well.', 'start': 3749.222, 'duration': 4.461}, {'end': 3758.565, 'text': "So let's write class node here and then have our init again.", 'start': 3753.703, 'duration': 4.862}, {'end': 3764.787, 'text': "And first of all, we'd like to pass on the game and the arguments from the MCTS itself.", 'start': 3759.985, 'duration': 4.802}, {'end': 3771.169, 'text': 'And then also we want to have a state as a node and then a parent.', 'start': 3765.827, 'duration': 5.342}], 'summary': 'The multicolored tree search involves return visit counts and defining a class for a node with game and state parameters.', 'duration': 32.695, 'max_score': 3738.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43738474.jpg'}, {'end': 3939.802, 'src': 'embed', 'start': 3910.237, 'weight': 3, 'content': [{'end': 3918.763, 'text': 'So from the video, you might remember that we want to keep selecting downwards the tree as long as our nodes are fully expanded themselves.', 'start': 3910.237, 'duration': 8.526}, {'end': 3923.867, 'text': 'So for doing so, we should write a method here that will tell us.', 'start': 3919.584, 'duration': 4.283}, {'end': 3925.889, 'text': 'So dev is fully expanded.', 'start': 3924.027, 'duration': 1.862}, {'end': 3933.178, 'text': 'And the node is fully expanded if there are no expandable moves.', 'start': 3929.435, 'duration': 3.743}, {'end': 3934.418, 'text': 'So that makes sense.', 'start': 3933.518, 'duration': 0.9}, {'end': 3939.802, 'text': 'And also if the number, if there are children right?', 'start': 3935.039, 'duration': 4.763}], 'summary': 'To select downwards, nodes must be fully expanded with no expandable moves or children.', 'duration': 29.565, 'max_score': 3910.237, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43910237.jpg'}, {'end': 4474.563, 'src': 'embed', 'start': 4444.904, 'weight': 2, 'content': [{'end': 4451.468, 'text': "And we should also keep one thing in mind, and that is that we're using this node.actionTaken method right here.", 'start': 4444.904, 'duration': 6.564}, {'end': 4461.875, 'text': 'But actually at the beginning we just have our simple root node and we can see that the root node is not fully expanded because it has all possible ways we could expand it on.', 'start': 4451.608, 'duration': 10.267}, {'end': 4466.298, 'text': 'So we will call this right here immediately with our root node.', 'start': 4462.555, 'duration': 3.743}, {'end': 4474.563, 'text': 'And the way we initiated our root node, we still have action taken is none, right?', 'start': 4466.998, 'duration': 7.565}], 'summary': 'Using node.actiontaken method, root node not fully expanded, with action taken as none.', 'duration': 29.659, 'max_score': 4444.904, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB44444904.jpg'}, {'end': 4678.008, 'src': 'embed', 'start': 4646.696, 'weight': 4, 'content': [{'end': 4652.16, 'text': 'And with mp.where, you always have to use the first argument you get when using the method.', 'start': 4646.696, 'duration': 5.464}, {'end': 4654.161, 'text': 'So yeah, this should work now.', 'start': 4652.9, 'duration': 1.261}, {'end': 4658.203, 'text': 'So yeah, we just first of all check what moves are legal, then we.', 'start': 4654.441, 'duration': 3.762}, {'end': 4665.092, 'text': 'use np.where to get the indices from all legal moves.', 'start': 4659.725, 'duration': 5.367}, {'end': 4670.98, 'text': 'And then we use np.random.choice to randomly sample one indices.', 'start': 4665.993, 'duration': 4.987}, {'end': 4678.008, 'text': "And this legal indices will then be the action we got, right? So that's great.", 'start': 4671.641, 'duration': 6.367}], 'summary': 'Using np.where to find legal moves and np.random.choice to select one, ensuring correct usage of mp.where.', 'duration': 31.312, 'max_score': 4646.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB44646696.jpg'}], 'start': 3632.191, 'title': 'Implementing monte carlo tree search', 'summary': 'Explores the implementation of monte carlo tree search algorithm for tic-tac-toe, detailing the structure of the search method, node class attributes, iterative processes, and visit count distribution. it also covers defining the root node, selecting fully expanded nodes, and calculating ucb score. additionally, it discusses selection, expansion in the game tree, including terminal node checking, value writing, and node expansion through legal move sampling.', 'chapters': [{'end': 3882.573, 'start': 3632.191, 'title': 'Monte carlo tree search for tic-tac-toe', 'summary': 'Outlines the implementation of monte carlo tree search for a tic-tac-toe game, including the structure of the search method, the node class, and the attributes involved, with an emphasis on the iterative processes and the resulting visit count distribution.', 'duration': 250.382, 'highlights': ['The search method is structured into selection, expansion, simulation, and backpropagation phases, iterating over a specified number of searches, and returns the visit count distribution for the children of all root nodes.', 'The node class includes attributes for the game, state, parent, action taken, children, expandable moves, visit count, and value sum, providing a comprehensive structure for the Monte Carlo tree search algorithm.', 'The expandable moves are initially populated with all valid moves for the initial state, and as the node is further expanded, these moves are removed from the list, indicating the potential for future expansion and exploration.']}, {'end': 4301.773, 'start': 3883.553, 'title': 'Monte carlo tree search', 'summary': 'Covers the implementation of the monte carlo tree search algorithm, including defining the root node, selection of fully expanded nodes, and calculating the ucb score for node selection.', 'duration': 418.22, 'highlights': ['Defined the root node by setting self.game, self.args, and the input state, and implemented the method to check if a node is fully expanded based on the absence of expandable moves and the presence of children.', 'Implemented the selection phase to keep selecting downwards the tree as long as the node is fully expanded, using the while loop, and the method for selecting a child based on the highest UCB score.', 'Developed the method for calculating the UCB score by considering the Q value, a constant for exploration or exploitation, and the visit counts of the parent and child nodes.']}, {'end': 4678.008, 'start': 4302.534, 'title': 'Selection and expansion in game tree', 'summary': "Discusses the process of selecting and expanding nodes in a game tree, including checking for terminal nodes, writing value from node's perspective, and expanding nodes by sampling legal moves.", 'duration': 375.474, 'highlights': ['The process of checking whether the selected node is a terminal one or not is crucial before expanding it, determined by writing value is terminal equals save.game.get value and terminated.', "The value from the node's perspective is determined by a method dev get opponent value, which changes the result from the parent's perspective to the child's perspective, important for evaluating game outcomes.", 'Node expansion involves sampling one expandable move and creating a new state for the child, followed by appending the child node to the list of children for future reference, illustrating the process of expanding nodes in a game tree.']}], 'duration': 1045.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB43632191.jpg', 'highlights': ['The search method is structured into selection, expansion, simulation, and backpropagation phases, iterating over a specified number of searches, and returns the visit count distribution for the children of all root nodes.', 'The node class includes attributes for the game, state, parent, action taken, children, expandable moves, visit count, and value sum, providing a comprehensive structure for the Monte Carlo tree search algorithm.', 'Defined the root node by setting self.game, self.args, and the input state, and implemented the method to check if a node is fully expanded based on the absence of expandable moves and the presence of children.', 'Implemented the selection phase to keep selecting downwards the tree as long as the node is fully expanded, using the while loop, and the method for selecting a child based on the highest UCB score.', 'The process of checking whether the selected node is a terminal one or not is crucial before expanding it, determined by writing value is terminal equals save.game.get value and terminated.', 'Node expansion involves sampling one expandable move and creating a new state for the child, followed by appending the child node to the list of children for future reference, illustrating the process of expanding nodes in a game tree.']}, {'end': 6219.952, 'segs': [{'end': 4799.793, 'src': 'embed', 'start': 4770.345, 'weight': 6, 'content': [{'end': 4777.09, 'text': 'So the parent and the child both think that they are player one, But instead of changing the player, we flip this state around.', 'start': 4770.345, 'duration': 6.745}, {'end': 4781.371, 'text': 'So we turn all positive numbers into negative ones and vice versa.', 'start': 4777.11, 'duration': 4.261}, {'end': 4789.053, 'text': "And this way, we can actually create a child state that has a different player, but still thinks it's player one.", 'start': 4781.971, 'duration': 7.082}, {'end': 4792.694, 'text': 'So this just makes the logic much more easy for our game.', 'start': 4789.513, 'duration': 3.181}, {'end': 4799.793, 'text': 'And it also makes the code here valid for one player games.', 'start': 4794.128, 'duration': 5.665}], 'summary': 'By flipping positive to negative numbers, game logic is simplified and code becomes valid for one player games.', 'duration': 29.448, 'max_score': 4770.345, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB44770345.jpg'}, {'end': 4887.864, 'src': 'embed', 'start': 4860.951, 'weight': 5, 'content': [{'end': 4864.954, 'text': 'And we have got our child state is ready now.', 'start': 4860.951, 'duration': 4.003}, {'end': 4868.617, 'text': 'So next, we can just create the child itself, which is a new node.', 'start': 4865.334, 'duration': 3.283}, {'end': 4870.699, 'text': "So let's say child equals node.", 'start': 4869.118, 'duration': 1.581}, {'end': 4874.682, 'text': 'And up here, we can see that first of all, we need the game.', 'start': 4871.5, 'duration': 3.182}, {'end': 4877.965, 'text': 'So it should just be save.game, then save.args.', 'start': 4874.762, 'duration': 3.203}, {'end': 4882.269, 'text': 'And then the state, which is just child state.', 'start': 4880.167, 'duration': 2.102}, {'end': 4887.864, 'text': 'And for the parent, we can choose ourselves as a node.', 'start': 4883.661, 'duration': 4.203}], 'summary': 'Created child node with game, args, state, and parent', 'duration': 26.913, 'max_score': 4860.951, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB44860951.jpg'}, {'end': 5804.363, 'src': 'heatmap', 'start': 5354.472, 'weight': 0.828, 'content': [{'end': 5354.992, 'text': 'And that should be it.', 'start': 5354.472, 'duration': 0.52}, {'end': 5359.277, 'text': 'So we have this recursive backpropagate method.', 'start': 5356.134, 'duration': 3.143}, {'end': 5364.499, 'text': 'And that should be all we need for our MCTS.', 'start': 5361.238, 'duration': 3.261}, {'end': 5367.92, 'text': 'So finally, we have gotten all of the results.', 'start': 5365.099, 'duration': 2.821}, {'end': 5371.281, 'text': 'We have backpropagated all of our visit counts and values and so on.', 'start': 5367.94, 'duration': 3.341}, {'end': 5375.463, 'text': 'So now we would actually like to return the distribution of visit counts.', 'start': 5371.902, 'duration': 3.561}, {'end': 5382.728, 'text': 'So to do that, we should just write new variable action props.', 'start': 5376.523, 'duration': 6.205}, {'end': 5387.112, 'text': 'So the probabilities of which actions look most promising.', 'start': 5382.748, 'duration': 4.364}, {'end': 5390.275, 'text': 'And at the beginning, that should just be np.zeros.', 'start': 5387.853, 'duration': 2.422}, {'end': 5395.019, 'text': 'And the shape here can just be the action size of our game.', 'start': 5390.435, 'duration': 4.584}, {'end': 5398.361, 'text': 'So save.game.get.actionSize.', 'start': 5395.199, 'duration': 3.162}, {'end': 5403.777, 'text': 'So for tic-tac-toe, just this one right here.', 'start': 5400.356, 'duration': 3.421}, {'end': 5409.538, 'text': 'And now we want to loop over all of our children again.', 'start': 5405.497, 'duration': 4.041}, {'end': 5414.139, 'text': "So let's say for child in self.children.", 'start': 5410.919, 'duration': 3.22}, {'end': 5420.661, 'text': "And then we'll write action props at child.actionTaken.", 'start': 5414.159, 'duration': 6.502}, {'end': 5426.081, 'text': 'And this probability should be equal to the visit count of our child.', 'start': 5422.338, 'duration': 3.743}, {'end': 5428.483, 'text': "So let's say child.VisitCount.", 'start': 5426.121, 'duration': 2.362}, {'end': 5434.648, 'text': 'And now we want to turn these into probabilities.', 'start': 5430.044, 'duration': 4.604}, {'end': 5440.252, 'text': "And the way we do that is we're just divided by its sum so that the sum will be equal to one.", 'start': 5434.828, 'duration': 5.424}, {'end': 5445.236, 'text': "And let's just write np.Sum of ActionProps right here.", 'start': 5441.774, 'duration': 3.462}, {'end': 5447.038, 'text': 'And then we can return this.', 'start': 5445.857, 'duration': 1.181}, {'end': 5452.881, 'text': "Nice, so that's looking promising, so let's test this out.", 'start': 5450.58, 'duration': 2.301}, {'end': 5461.503, 'text': 'And there is still invalid syntax here, so I just should remove this here.', 'start': 5454.141, 'duration': 7.362}, {'end': 5462.784, 'text': 'And now this is working.', 'start': 5461.923, 'duration': 0.861}, {'end': 5465.245, 'text': "So let's test this out.", 'start': 5463.444, 'duration': 1.801}, {'end': 5472.627, 'text': 'And we can create the MCTS object here inside of our test script we used for the game.', 'start': 5465.665, 'duration': 6.962}, {'end': 5477.268, 'text': "So let's write MCTS equals MCTS.", 'start': 5473.747, 'duration': 3.521}, {'end': 5479.429, 'text': "And for the game, we'll use tic-tac-toe.", 'start': 5477.688, 'duration': 1.741}, {'end': 5482.761, 'text': 'And now we still have to define our arguments.', 'start': 5480.498, 'duration': 2.263}, {'end': 5496.575, 'text': 'So for C, which we use in our UCB formula, we can just roughly say that we want the square root of two, which is what you might use generally.', 'start': 5482.781, 'duration': 13.794}, {'end': 5502.562, 'text': 'And then for the number of searches, we can set that to a thousand.', 'start': 5496.595, 'duration': 5.967}, {'end': 5510.726, 'text': "Nice So let's also pass on the arcs here inside of our multicolored research.", 'start': 5505.922, 'duration': 4.804}, {'end': 5515.75, 'text': 'And then during our game, first of all, I should remove that.', 'start': 5512.027, 'duration': 3.723}, {'end': 5523.196, 'text': 'We should say that we only do all of this, so acting ourselves if we are player one.', 'start': 5517.231, 'duration': 5.965}, {'end': 5535.972, 'text': 'And in all other cases, so in tic-tac-toe when player is negative one, we want to do a multicolored research.', 'start': 5528.307, 'duration': 7.665}, {'end': 5542.957, 'text': 'So we can say MCTS props quits MCTS.search of state.', 'start': 5536.433, 'duration': 6.524}, {'end': 5549.715, 'text': 'But you should remember that we always like to be player one here when we do our multicolored research.', 'start': 5544.15, 'duration': 5.565}, {'end': 5552.917, 'text': "So first of all, let's write neutral state equals.", 'start': 5550.215, 'duration': 2.702}, {'end': 5558.422, 'text': 'And then we can say save.game.changePerspective.', 'start': 5553.958, 'duration': 4.464}, {'end': 5560.264, 'text': 'And also not save.game, but tick.total.', 'start': 5558.662, 'duration': 1.602}, {'end': 5568.639, 'text': "And then we'll use this general state here.", 'start': 5565.816, 'duration': 2.823}, {'end': 5572.842, 'text': 'And for the player, we can just set a player, which is here always negative one.', 'start': 5568.819, 'duration': 4.023}, {'end': 5574.504, 'text': 'So we always change the perspective.', 'start': 5572.862, 'duration': 1.642}, {'end': 5580.109, 'text': 'So then instead of MCTS search, we should just use this neutral state we have just created.', 'start': 5574.524, 'duration': 5.585}, {'end': 5583.271, 'text': "And that's great.", 'start': 5582.311, 'duration': 0.96}, {'end': 5586.972, 'text': 'And now, out of these probabilities, we want to sample an action.', 'start': 5583.691, 'duration': 3.281}, {'end': 5592.033, 'text': 'And to keep things easy, we can just sample the action that looks most promising.', 'start': 5587.652, 'duration': 4.381}, {'end': 5595.573, 'text': 'And the way we do that is by using np.argmax.', 'start': 5592.133, 'duration': 3.44}, {'end': 5604.135, 'text': 'And this will just return the child that has been visited most number of times.', 'start': 5600.334, 'duration': 3.801}, {'end': 5609.056, 'text': 'And inside here, we can use the props of our MCTS.', 'start': 5605.275, 'duration': 3.781}, {'end': 5612.904, 'text': 'And this is just the action we have here.', 'start': 5611.102, 'duration': 1.802}, {'end': 5614.886, 'text': "So let's test this out to see if this works.", 'start': 5612.984, 'duration': 1.902}, {'end': 5619.85, 'text': 'Okay, so we still have a problem here.', 'start': 5617.748, 'duration': 2.102}, {'end': 5623.833, 'text': 'Also, for parent, we need to use self.parent.', 'start': 5620.711, 'duration': 3.122}, {'end': 5634.963, 'text': 'And here I have a typo.', 'start': 5633.882, 'duration': 1.081}, {'end': 5641.094, 'text': 'So just want to say best child.', 'start': 5638.392, 'duration': 2.702}, {'end': 5653.022, 'text': "And here, we don't want to use save to children, obviously, but the children of our root node.", 'start': 5647.558, 'duration': 5.464}, {'end': 5659.506, 'text': 'So instead of MCTS, we can say for child in root.children.', 'start': 5653.682, 'duration': 5.824}, {'end': 5663.512, 'text': 'So perfect.', 'start': 5662.671, 'duration': 0.841}, {'end': 5668.076, 'text': 'So now we can see that this Monte Carlo Tree Search moved here inside of the middle.', 'start': 5664.093, 'duration': 3.983}, {'end': 5673.862, 'text': "And now we, as a player, let's play a bit stupid here to check this out.", 'start': 5668.897, 'duration': 4.965}, {'end': 5675.684, 'text': 'We can just play here.', 'start': 5674.403, 'duration': 1.281}, {'end': 5676.765, 'text': 'My MCTS played here.', 'start': 5675.744, 'duration': 1.021}, {'end': 5679.748, 'text': 'Then we can say I will play a 7.', 'start': 5677.306, 'duration': 2.442}, {'end': 5681.049, 'text': 'And now the MCTS played here.', 'start': 5679.748, 'duration': 1.301}, {'end': 5689.673, 'text': 'And from the roots of Tic Tac Toe, you know that we the MCTS has hit three negative ones here instead of this row here.', 'start': 5681.27, 'duration': 8.403}, {'end': 5690.674, 'text': 'So it has one.', 'start': 5690.133, 'duration': 0.541}, {'end': 5691.635, 'text': 'So nice.', 'start': 5690.974, 'duration': 0.661}, {'end': 5694.237, 'text': 'So now we can see that this MCTS is working well.', 'start': 5691.835, 'duration': 2.402}, {'end': 5700.602, 'text': 'Okay, so now that we have our standalone multicolored tree search implemented right here,', 'start': 5694.257, 'duration': 6.345}, {'end': 5712.906, 'text': 'we can now start with building the neural network so that we can then later use the AlphaZero algorithm in order to train this model that can understand how to play these given games.', 'start': 5700.602, 'duration': 12.304}, {'end': 5724.429, 'text': 'Before we start with building, let me just briefly talk about the architecture that we have for our neural network.', 'start': 5714.306, 'duration': 10.123}, {'end': 5727.75, 'text': 'This is just a brief visualization here.', 'start': 5724.889, 'duration': 2.861}, {'end': 5735.885, 'text': 'And first of all, we have this state right here that will give us input to our neural network.', 'start': 5728.983, 'duration': 6.902}, {'end': 5739.827, 'text': 'And in our case, this state will just be a board position.', 'start': 5736.846, 'duration': 2.981}, {'end': 5742.768, 'text': 'So for example, a board of tic-tac-toe right here.', 'start': 5740.147, 'duration': 2.621}, {'end': 5754.035, 'text': 'And actually we encode the state so that later we have these three different planes next to each other, like in this image right here.', 'start': 5743.863, 'duration': 10.172}, {'end': 5760.181, 'text': 'And actually we have one plane for all of the fields in which player negative one has played.', 'start': 5754.896, 'duration': 5.285}, {'end': 5765.393, 'text': 'So where these fields will then be turned to ones and all other fields will just be zeros.', 'start': 5761.348, 'duration': 4.045}, {'end': 5771.321, 'text': 'And then we also have a plane for all the empty fields and these will then be turned to ones or other fields being zeros.', 'start': 5765.834, 'duration': 5.487}, {'end': 5776.007, 'text': 'And then we also have this last plane for all the fields in which player positive one played.', 'start': 5771.922, 'duration': 4.085}, {'end': 5782.036, 'text': 'And yeah, here again, these fields will be turned to ones and all other fields will be turned to zeros.', 'start': 5776.728, 'duration': 5.308}, {'end': 5786.142, 'text': 'So by having these three planes right here,', 'start': 5782.697, 'duration': 3.445}, {'end': 5793.093, 'text': 'it is easier for our neural network to basically recognize patterns and understand how to play this game right?', 'start': 5786.142, 'duration': 6.951}, {'end': 5804.363, 'text': 'So, essentially, we encode our board game position almost so that it looks like an image afterwards, right?', 'start': 5794.355, 'duration': 10.008}], 'summary': 'Implementing monte carlo tree search algorithm for game analysis and decision making, with a successful test of the mcts in a tic-tac-toe game.', 'duration': 449.891, 'max_score': 5354.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB45354472.jpg'}, {'end': 5395.019, 'src': 'embed', 'start': 5367.94, 'weight': 12, 'content': [{'end': 5371.281, 'text': 'We have backpropagated all of our visit counts and values and so on.', 'start': 5367.94, 'duration': 3.341}, {'end': 5375.463, 'text': 'So now we would actually like to return the distribution of visit counts.', 'start': 5371.902, 'duration': 3.561}, {'end': 5382.728, 'text': 'So to do that, we should just write new variable action props.', 'start': 5376.523, 'duration': 6.205}, {'end': 5387.112, 'text': 'So the probabilities of which actions look most promising.', 'start': 5382.748, 'duration': 4.364}, {'end': 5390.275, 'text': 'And at the beginning, that should just be np.zeros.', 'start': 5387.853, 'duration': 2.422}, {'end': 5395.019, 'text': 'And the shape here can just be the action size of our game.', 'start': 5390.435, 'duration': 4.584}], 'summary': 'Backpropagated visit counts and values, returning distribution of visit counts using new variable action props with np.zeros and shape as action size.', 'duration': 27.079, 'max_score': 5367.94, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB45367940.jpg'}, {'end': 5995.267, 'src': 'embed', 'start': 5959.41, 'weight': 1, 'content': [{'end': 5962.591, 'text': 'And at the beginning here, we just have a singular conf block.', 'start': 5959.41, 'duration': 3.181}, {'end': 5967.573, 'text': 'And this conf block will take the output from our backbone as input like this.', 'start': 5963.292, 'duration': 4.281}, {'end': 5974.095, 'text': 'And then after we have gone through that conf block, we will then flatten out the results here.', 'start': 5968.813, 'duration': 5.282}, {'end': 5984.659, 'text': 'And then we just have this linear layer or fully connected layer between the outputs for our conf block and then also these neurons right here.', 'start': 5975.472, 'duration': 9.187}, {'end': 5995.267, 'text': 'And at the end we just want to have nine neurons in the case of tic-tac-toe, because we want to have one neuron for each potential action right?', 'start': 5985.519, 'duration': 9.748}], 'summary': 'Neural network architecture includes conf block, flattening, and 9 output neurons for tic-tac-toe.', 'duration': 35.857, 'max_score': 5959.41, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB45959410.jpg'}, {'end': 6060.939, 'src': 'embed', 'start': 6008.943, 'weight': 0, 'content': [{'end': 6017.91, 'text': 'But when we want to get this readable distribution telling us where to play, we also have to apply the softmax function.', 'start': 6008.943, 'duration': 8.967}, {'end': 6025.756, 'text': 'And this will basically turn the outputs from our nine neurons to this distribution of probabilities.', 'start': 6018.47, 'duration': 7.286}, {'end': 6034.863, 'text': 'And yeah, then each probability will basically indicate us how promising a certain action is.', 'start': 6026.316, 'duration': 8.547}, {'end': 6042.949, 'text': 'This is why we have these nine neurons here and then why we also in practice later call the softmax method.', 'start': 6035.144, 'duration': 7.805}, {'end': 6053.735, 'text': 'The second head is this value head right here and here we also have the singular conf block that is different from this one right here.', 'start': 6043.069, 'duration': 10.666}, {'end': 6060.939, 'text': 'We have, uh, this connection here as well and this also takes the output from our backbone as input.', 'start': 6053.775, 'duration': 7.164}], 'summary': 'Neural network applies softmax function for 9 neurons to determine action probabilities and value.', 'duration': 51.996, 'max_score': 6008.943, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB46008943.jpg'}, {'end': 6117.109, 'src': 'embed', 'start': 6088.21, 'weight': 4, 'content': [{'end': 6090.891, 'text': 'And because of that we only need one neuron right?', 'start': 6088.21, 'duration': 2.681}, {'end': 6103.674, 'text': 'And the best way to get this range of negative one to positive one is by also applying this 10H activation function onto this last singular neuron right here.', 'start': 6091.551, 'duration': 12.123}, {'end': 6107.134, 'text': "So here's just a visualization of the 10H function.", 'start': 6104.074, 'duration': 3.06}, {'end': 6113.696, 'text': 'And basically this will just squish all of our potential values into this range of negative one to positive one.', 'start': 6107.715, 'duration': 5.981}, {'end': 6117.109, 'text': 'And yeah, this is exactly what we want for our value head.', 'start': 6114.528, 'duration': 2.581}], 'summary': 'Applying 10h activation function squishes values to -1 to +1 range, meeting the requirement for the value head.', 'duration': 28.899, 'max_score': 6088.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB46088210.jpg'}], 'start': 4678.448, 'title': 'Implementing monte carlo tree search and neural network training', 'summary': 'Discusses the implementation of monte carlo tree search logic and algorithm, including its application for one-player games and tic-tac-toe, as well as the use of resnet architecture for neural network training in pytorch, emphasizing the importance of specific versions for multi-target cross entropy loss and cuda support.', 'chapters': [{'end': 4859.87, 'start': 4678.448, 'title': 'Monte carlo tree search logic', 'summary': 'Discusses the implementation of monte carlo tree search logic for game states, including the concept of changing player perspectives and its applications for one-player games.', 'duration': 181.422, 'highlights': ["The implementation includes the concept of changing player perspectives, allowing for the perception of a different player while retaining the original player's perspective.", 'The code is designed to be valid for one-player games, simplifying the logic and making it applicable to a wider range of games.', "A method for changing the perspective of the state and player is introduced, demonstrating the transformation of positive numbers to negative ones based on the player's perspective in tic-tac-toe."]}, {'end': 5465.245, 'start': 4860.951, 'title': 'Mcts algorithm implementation', 'summary': 'Covers the implementation of the mcts algorithm, including creating child nodes, performing rollouts, backpropagation, and returning the distribution of visit counts for promising actions.', 'duration': 604.294, 'highlights': ['Creating Child Nodes The code demonstrates the creation of child nodes by using the save.game, save.args, and child state, followed by appending the child to the list of children and returning the created child.', 'Performing Rollouts The process of performing rollouts involves simulating random actions until reaching a terminal node, determining the outcome of the game, and using this information for backpropagation, prioritizing nodes where the player won.', 'Backpropagation The backpropagation process involves adding the value to the value sum, incrementing the visit count, and propagating up to the root node while considering the parent as a different player, ultimately returning the distribution of visit counts for promising actions.']}, {'end': 5826.676, 'start': 5465.665, 'title': 'Implementing monte carlo tree search for tic-tac-toe', 'summary': 'Covers implementing a monte carlo tree search (mcts) for tic-tac-toe, setting parameters for the search, adjusting perspectives, and testing the mcts performance, leading to the plan to integrate it with the alphazero algorithm for training neural networks.', 'duration': 361.011, 'highlights': ['Implementing MCTS for tic-tac-toe The chapter discusses the implementation of a Monte Carlo Tree Search (MCTS) for the game of tic-tac-toe, showcasing the process of setting up the MCTS object and defining parameters such as C and the number of searches.', "Encoding game state for the neural network The transcript explains encoding the tic-tac-toe board state into three planes representing the fields played by player negative one, empty fields, and player positive one, optimizing the neural network's ability to recognize patterns and play the game effectively.", 'Integration with AlphaZero algorithm The plan to integrate the MCTS with the AlphaZero algorithm for training neural networks is mentioned, indicating a strategic approach to build and train models for playing games effectively.']}, {'end': 6219.952, 'start': 5826.996, 'title': 'Neural network architecture and training in pytorch', 'summary': 'Discusses the use of resnet architecture in the backbone of a neural network, with skip connections and the application of softmax function for generating distribution of probabilities. it also explains the construction of policy head and value head, the use of pytorch framework, and the importance of importing specific versions for multi-target cross entropy loss and cuda support.', 'duration': 392.956, 'highlights': ['The ResNet architecture uses skip connections to store the residual values, making the model more flexible and allowing it to mask out convolutional blocks, ultimately improving the understanding of images. The ResNet architecture employs skip connections to store residual values, improving model flexibility and image understanding.', 'The construction of the policy head involves using a singular convolutional block to process the backbone output, followed by flattening and a fully connected layer to produce nine neurons representing potential actions in tic-tac-toe. The policy head construction includes a singular convolutional block, flattening, and a fully connected layer to output nine neurons representing potential actions in tic-tac-toe.', 'The application of softmax function transforms the logits from the neurons into a distribution of probabilities, indicating the potential of certain actions. The softmax function transforms the logits into a distribution of probabilities, indicating the potential of certain actions.', 'The value head construction includes a singular convolutional block, flattening, and a fully connected layer to produce one neuron representing the estimation of the state, with the 10H activation function used to achieve a range of negative one to positive one. The value head construction involves a singular convolutional block, flattening, and a fully connected layer to output one neuron representing the estimation of the state, using the 10H activation function to achieve a range of negative one to positive one.', 'The use of PyTorch framework is emphasized, along with the recommendation to import specific versions for multi-target cross entropy loss and CUDA support, depending on the system setup. The importance of using the PyTorch framework and importing specific versions for multi-target cross entropy loss and CUDA support is highlighted.']}], 'duration': 1541.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB44678448.jpg', 'highlights': ["The implementation includes the concept of changing player perspectives, allowing for the perception of a different player while retaining the original player's perspective.", 'The code is designed to be valid for one-player games, simplifying the logic and making it applicable to a wider range of games.', "A method for changing the perspective of the state and player is introduced, demonstrating the transformation of positive numbers to negative ones based on the player's perspective in tic-tac-toe.", 'Creating Child Nodes The code demonstrates the creation of child nodes by using the save.game, save.args, and child state, followed by appending the child to the list of children and returning the created child.', 'Performing Rollouts The process of performing rollouts involves simulating random actions until reaching a terminal node, determining the outcome of the game, and using this information for backpropagation, prioritizing nodes where the player won.', 'Backpropagation The backpropagation process involves adding the value to the value sum, incrementing the visit count, and propagating up to the root node while considering the parent as a different player, ultimately returning the distribution of visit counts for promising actions.', 'Implementing MCTS for tic-tac-toe The chapter discusses the implementation of a Monte Carlo Tree Search (MCTS) for the game of tic-tac-toe, showcasing the process of setting up the MCTS object and defining parameters such as C and the number of searches.', "Encoding game state for the neural network The transcript explains encoding the tic-tac-toe board state into three planes representing the fields played by player negative one, empty fields, and player positive one, optimizing the neural network's ability to recognize patterns and play the game effectively.", 'Integration with AlphaZero algorithm The plan to integrate the MCTS with the AlphaZero algorithm for training neural networks is mentioned, indicating a strategic approach to build and train models for playing games effectively.', 'The ResNet architecture uses skip connections to store the residual values, making the model more flexible and allowing it to mask out convolutional blocks, ultimately improving the understanding of images.', 'The construction of the policy head involves using a singular convolutional block to process the backbone output, followed by flattening and a fully connected layer to produce nine neurons representing potential actions in tic-tac-toe.', 'The application of softmax function transforms the logits from the neurons into a distribution of probabilities, indicating the potential of certain actions.', 'The value head construction includes a singular convolutional block, flattening, and a fully connected layer to produce one neuron representing the estimation of the state, using the 10H activation function to achieve a range of negative one to positive one.', 'The use of PyTorch framework is emphasized, along with the recommendation to import specific versions for multi-target cross entropy loss and CUDA support, depending on the system setup.']}, {'end': 7658.345, 'segs': [{'end': 7144.569, 'src': 'heatmap', 'start': 6990.324, 'weight': 0.779, 'content': [{'end': 6995.611, 'text': 'Okay So now we can just print the state for now.', 'start': 6990.324, 'duration': 5.287}, {'end': 6998.696, 'text': 'And yeah, we get this state right here.', 'start': 6997.415, 'duration': 1.281}, {'end': 7006.5, 'text': 'Okay, so next we should remember that we also have to encode our state when we give it to our model.', 'start': 6999.396, 'duration': 7.104}, {'end': 7010.122, 'text': 'So we also have to write this new method right here.', 'start': 7007.26, 'duration': 2.862}, {'end': 7017.406, 'text': 'So here we just write def getEncodedState of self and then just of the state right here.', 'start': 7010.162, 'duration': 7.244}, {'end': 7029.584, 'text': 'And remember that we want to have these three planes, right? So we can get this by writing nCodedState equals np.Stack.', 'start': 7018.426, 'duration': 11.158}, {'end': 7039.403, 'text': 'And then here we want to stack first of all the state for all fields that are equal to negative one with the state of all empty fields,', 'start': 7031.781, 'duration': 7.622}, {'end': 7048.125, 'text': 'so the fields that are equal to zero, with the state for all of the fields that are equal to positive one, like this', 'start': 7039.403, 'duration': 8.722}, {'end': 7049.865, 'text': 'So yeah, this should be our encoded state.', 'start': 7048.205, 'duration': 1.66}, {'end': 7054.407, 'text': 'And here these will just be booleans, but rather we want to have floats.', 'start': 7050.686, 'duration': 3.721}, {'end': 7057.627, 'text': "So let's just set the type here to np.float32.", 'start': 7054.827, 'duration': 2.8}, {'end': 7062.624, 'text': 'And now we can just return this encoded state.', 'start': 7060.362, 'duration': 2.262}, {'end': 7080.037, 'text': "Okay, so let's test this out by writing encoded state and setting that equal to tick tock toe, but get encoded state of the state right here.", 'start': 7062.644, 'duration': 17.393}, {'end': 7081.298, 'text': "Let's just print that here.", 'start': 7080.277, 'duration': 1.021}, {'end': 7090.597, 'text': 'Okay, so now we see that we have this just stayed right here.', 'start': 7081.318, 'duration': 9.279}, {'end': 7097.102, 'text': 'And this is our encoded state, right? So first of all, we have this plane for all of the fields in which player negative one is played.', 'start': 7090.657, 'duration': 6.445}, {'end': 7100.584, 'text': 'So just this field right here, which is encoded as a one here.', 'start': 7097.142, 'duration': 3.442}, {'end': 7103.466, 'text': 'And then we have these empty fields right here.', 'start': 7101.625, 'duration': 1.841}, {'end': 7106.929, 'text': 'And then we also have these fields where player positive one played.', 'start': 7103.626, 'duration': 3.303}, {'end': 7111.472, 'text': 'So just this field right here, right? Okay, so this is great.', 'start': 7107.049, 'duration': 4.423}, {'end': 7120.639, 'text': 'And now we want to get this policy and this value, right? So first of all, we have to turn this state to a tensor now.', 'start': 7111.852, 'duration': 8.787}, {'end': 7130.494, 'text': 'So because of that, we can write tensor state and set that equal to torch.tensor of encoded state like this.', 'start': 7121.701, 'duration': 8.793}, {'end': 7139.784, 'text': 'Then when we give a tensor to our model as input, we also always need this batch dimension, right?', 'start': 7131.974, 'duration': 7.81}, {'end': 7144.569, 'text': 'But here we just have one state and not a whole batch of states.', 'start': 7141.245, 'duration': 3.324}], 'summary': 'Creating an encoded state for tic-tac-toe game using numpy and torch.tensor', 'duration': 154.245, 'max_score': 6990.324, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB46990324.jpg'}, {'end': 7054.407, 'src': 'embed', 'start': 7031.781, 'weight': 2, 'content': [{'end': 7039.403, 'text': 'And then here we want to stack first of all the state for all fields that are equal to negative one with the state of all empty fields,', 'start': 7031.781, 'duration': 7.622}, {'end': 7048.125, 'text': 'so the fields that are equal to zero, with the state for all of the fields that are equal to positive one, like this', 'start': 7039.403, 'duration': 8.722}, {'end': 7049.865, 'text': 'So yeah, this should be our encoded state.', 'start': 7048.205, 'duration': 1.66}, {'end': 7054.407, 'text': 'And here these will just be booleans, but rather we want to have floats.', 'start': 7050.686, 'duration': 3.721}], 'summary': 'Stack state for -1, 0, and 1 fields to encode. use floats instead of booleans.', 'duration': 22.626, 'max_score': 7031.781, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB47031781.jpg'}, {'end': 7120.639, 'src': 'embed', 'start': 7090.657, 'weight': 1, 'content': [{'end': 7097.102, 'text': 'And this is our encoded state, right? So first of all, we have this plane for all of the fields in which player negative one is played.', 'start': 7090.657, 'duration': 6.445}, {'end': 7100.584, 'text': 'So just this field right here, which is encoded as a one here.', 'start': 7097.142, 'duration': 3.442}, {'end': 7103.466, 'text': 'And then we have these empty fields right here.', 'start': 7101.625, 'duration': 1.841}, {'end': 7106.929, 'text': 'And then we also have these fields where player positive one played.', 'start': 7103.626, 'duration': 3.303}, {'end': 7111.472, 'text': 'So just this field right here, right? Okay, so this is great.', 'start': 7107.049, 'duration': 4.423}, {'end': 7120.639, 'text': 'And now we want to get this policy and this value, right? So first of all, we have to turn this state to a tensor now.', 'start': 7111.852, 'duration': 8.787}], 'summary': 'The transcript discusses encoding of player states and the conversion of state to a tensor.', 'duration': 29.982, 'max_score': 7090.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB47090657.jpg'}, {'end': 7349.703, 'src': 'embed', 'start': 7321.961, 'weight': 3, 'content': [{'end': 7324.803, 'text': 'So this is basically how our policy looks like.', 'start': 7321.961, 'duration': 2.842}, {'end': 7331.849, 'text': 'So for each action, we have this bar right here telling us how promising this action is.', 'start': 7324.863, 'duration': 6.986}, {'end': 7338.414, 'text': "And then obviously our model was just initiated randomly currently, so we can't expect too much here.", 'start': 7332.49, 'duration': 5.924}, {'end': 7349.703, 'text': 'So nothing actually, but when we later have this trained model, we can expect this nice distribution of bars telling us where to play, right? Okay.', 'start': 7339.095, 'duration': 10.608}], 'summary': 'Policy shows promising action with trained model for better distribution.', 'duration': 27.742, 'max_score': 7321.961, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB47321961.jpg'}, {'end': 7467.258, 'src': 'embed', 'start': 7439.497, 'weight': 0, 'content': [{'end': 7449.606, 'text': 'So that when we choose a node during our selection phase, we are more likely to choose nodes with higher policy values,', 'start': 7439.497, 'duration': 10.109}, {'end': 7454.769, 'text': 'because these were the ones that seemed more promising to our model right?', 'start': 7449.606, 'duration': 5.163}, {'end': 7462.955, 'text': 'So we want to choose these more often and walk down the tree inside of these directions where our model guides us.', 'start': 7454.929, 'duration': 8.026}, {'end': 7467.258, 'text': 'So this is the one thing we use our model for.', 'start': 7463.855, 'duration': 3.403}], 'summary': 'Choosing nodes with higher policy values guides our model during selection phase.', 'duration': 27.761, 'max_score': 7439.497, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB47439497.jpg'}], 'start': 6223.167, 'title': 'Neural network model creation and integration for game ai', 'summary': 'Covers the creation of neural network models for mcts implementation and game ai, including architecture creation, resnet implementation, and model usage in monte carlo tree search.', 'chapters': [{'end': 6514.588, 'start': 6223.167, 'title': 'Neural network model creation', 'summary': 'Covers the creation of a neural network model for mcts implementation, including the initialization of the model, the structure of the backbone, and the definition of res blocks and their components.', 'duration': 291.421, 'highlights': ["The creation of a model for MCTS implementation involves defining a class 'resnet' inheriting from 'nn.module', with parameters such as game, number of res blocks, and hidden size for conf blocks.", 'Explanation of the structure of the start block, including the use of conf block, batch norm, and value to optimize training speed and safety.', "The process of creating a backbone using 'nn.moduleList' and an array of different res blocks for the MCTS implementation."]}, {'end': 6852.271, 'start': 6515.228, 'title': 'Neural network architecture for game ai', 'summary': 'Explains the process of creating a neural network architecture for a game ai, including the creation of residual connections and the definition of policy and value heads using specific layers and dimensions.', 'duration': 337.043, 'highlights': ['Creating residual connections for skip connections The process involves setting the residual equal to X, updating X with conf blocks, summing up the output with the residual, and returning the final output as the sum.', 'Defining policy head using specific layers and dimensions The policy head is defined using nn.Sequential with a conf block, flattening of results, a linear layer with specific input and output dimensions, and multiplication for the input size.', 'Defining value head with specific layers and dimensions The value head is defined using nn.Sequential with a conf block, BatchNorm2D, flattening of results, a linear layer with specific input and output dimensions, and the addition of a 10H activation function.']}, {'end': 7412.813, 'start': 6852.832, 'title': 'Implementing resnet and neural network integration', 'summary': 'Covers implementing the forward method for resnet class, testing the model with a tic-tac-toe game, encoding the game state, processing policy and value, and setting a seed for reproducible results.', 'duration': 559.981, 'highlights': ['The chapter covers implementing the forward method for the ResNet class, which involves sending the input through start block and looping over res blocks in the backbone, with the number of res blocks set to four and the number of hidden set to 64.', 'Testing the model with a tic-tac-toe game involves creating the game, updating the state, encoding the state with three planes representing fields for player -1, empty fields, and player 1, and converting the state to a tensor for further processing.', 'Processing the policy and value includes getting the float value from the tensor, applying softmax on the policy, visualizing the policy distribution using matplotlib, and setting a seed for reproducible results using torch.manual_seed(0).']}, {'end': 7658.345, 'start': 7413.774, 'title': 'Monte carlo tree search in reinforcement learning', 'summary': 'Explains the process of using a model in monte carlo tree search to guide node selection and back propagation, including encoding state, obtaining policy and value, and transforming logits into a distribution of likelihoods.', 'duration': 244.571, 'highlights': ["Using the model to guide node selection during the selection phase based on higher policy values increases the likelihood of choosing nodes that seem more promising to the model. This approach increases the likelihood of choosing nodes with higher policy values, improving the model's guidance in the tree traversal.", "Using the value from the model to back propagate eliminates the need for random action rollouts, as it directly integrates the model's value. By using the model's value for back propagation, the need for random action rollouts is eliminated, streamlining the process.", 'Encoding the state of a node into a tensor and obtaining policy and value from the model by passing the encoded state as input. The process involves encoding the state into a tensor, passing it to the model, and obtaining policy and value as results.', 'Transforming policy logits into a distribution of likelihoods using softmax and adjusting the axes, followed by converting to NumPy arrays. The transformation involves applying softmax to the policy logits, adjusting the axes, and converting the result to a NumPy array.']}], 'duration': 1435.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB46223167.jpg', 'highlights': ["The creation of a model for MCTS implementation involves defining a class 'resnet' inheriting from 'nn.module', with parameters such as game, number of res blocks, and hidden size for conf blocks.", 'Creating residual connections for skip connections The process involves setting the residual equal to X, updating X with conf blocks, summing up the output with the residual, and returning the final output as the sum.', 'The chapter covers implementing the forward method for the ResNet class, which involves sending the input through start block and looping over res blocks in the backbone, with the number of res blocks set to four and the number of hidden set to 64.', "Using the model to guide node selection during the selection phase based on higher policy values increases the likelihood of choosing nodes that seem more promising to the model. This approach increases the likelihood of choosing nodes with higher policy values, improving the model's guidance in the tree traversal."]}, {'end': 8540.263, 'segs': [{'end': 8181.447, 'src': 'heatmap', 'start': 8024.318, 'weight': 0.883, 'content': [{'end': 8034.08, 'text': 'And so, first of all, we have to say that if we call this UCB formula on a child that has a visit count of zero,', 'start': 8024.318, 'duration': 9.762}, {'end': 8036.541, 'text': "then we can't calculate the Q value right?", 'start': 8034.08, 'duration': 2.461}, {'end': 8039.921, 'text': "So let's just set the Q value to zero in that case as well.", 'start': 8036.721, 'duration': 3.2}, {'end': 8052.684, 'text': "So let's say if child.VisitCount equals zero, then Q value should equal zero and then else, sorry,", 'start': 8040.501, 'duration': 12.183}, {'end': 8056.207, 'text': 'Q value should be equal to this formula right here.', 'start': 8053.365, 'duration': 2.842}, {'end': 8063.993, 'text': "So because now we don't back propagate immediately on a node that has just been created during expanding.", 'start': 8057.268, 'duration': 6.725}, {'end': 8072.159, 'text': "So it is actually possible now to call this UCB method on a child that hasn't been visited before.", 'start': 8064.793, 'duration': 7.366}, {'end': 8075.661, 'text': 'So yeah, this way we change this part right here.', 'start': 8072.879, 'duration': 2.782}, {'end': 8079.204, 'text': 'And then also we want to update this part here below.', 'start': 8076.322, 'duration': 2.882}, {'end': 8080.205, 'text': "So that's right.", 'start': 8079.304, 'duration': 0.901}, {'end': 8083.425, 'text': 'it like this image here.', 'start': 8081.503, 'duration': 1.922}, {'end': 8089.291, 'text': "So let's first of all remove this math.log from here.", 'start': 8084.246, 'duration': 5.045}, {'end': 8095.078, 'text': 'And then here we want to add a one, the visit count of our child.', 'start': 8089.752, 'duration': 5.326}, {'end': 8102.591, 'text': 'And then we also want to multiply everything up with the prior of our child.', 'start': 8097.15, 'duration': 5.441}, {'end': 8111.813, 'text': 'So that we also use the policy when we select downwards the tree.', 'start': 8103.051, 'duration': 8.762}, {'end': 8115.094, 'text': 'So this should be working now.', 'start': 8113.093, 'duration': 2.001}, {'end': 8119.174, 'text': "So let's see if we got everything ready.", 'start': 8115.994, 'duration': 3.18}, {'end': 8120.475, 'text': "So yeah, I think that's it.", 'start': 8119.294, 'duration': 1.181}, {'end': 8123.375, 'text': "So let's just run this right here.", 'start': 8120.715, 'duration': 2.66}, {'end': 8126.856, 'text': 'So first of all, I will just update C to be equal to two.', 'start': 8123.555, 'duration': 3.301}, {'end': 8130.369, 'text': 'And then we can create a model right here.', 'start': 8127.748, 'duration': 2.621}, {'end': 8132.849, 'text': "So let's write model equals resnet.", 'start': 8130.829, 'duration': 2.02}, {'end': 8138.69, 'text': 'And then for the number, for the game, we can just do tic-tac-toe.', 'start': 8134.149, 'duration': 4.541}, {'end': 8142.531, 'text': "For the number of rest blocks, let's just say four.", 'start': 8139.831, 'duration': 2.7}, {'end': 8147.332, 'text': 'And for the number of our hidden planes, we can just set that to 64.', 'start': 8142.731, 'duration': 4.601}, {'end': 8149.793, 'text': 'Then, you know what, also evaluate our model right here.', 'start': 8147.332, 'duration': 2.461}, {'end': 8159.491, 'text': "Currently, this model has just been initiated randomly, right? So we can't expect too much, but let's try this out.", 'start': 8151.36, 'duration': 8.131}, {'end': 8164.778, 'text': "So, oh, sorry, we're still missing the model.", 'start': 8161.954, 'duration': 2.824}, {'end': 8166.801, 'text': "Here's the argument inside of ORM-CTS.", 'start': 8164.958, 'duration': 1.843}, {'end': 8168.303, 'text': 'So yeah, like this.', 'start': 8167.482, 'duration': 0.821}, {'end': 8173.983, 'text': 'so nice.', 'start': 8173.163, 'duration': 0.82}, {'end': 8181.447, 'text': "so now we see that with our updated mcts um, we played right here and you know what let's say.", 'start': 8173.983, 'duration': 7.464}], 'summary': 'Updating ucb formula and testing with tic-tac-toe game.', 'duration': 157.129, 'max_score': 8024.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48024318.jpg'}, {'end': 8212.48, 'src': 'embed', 'start': 8181.447, 'weight': 2, 'content': [{'end': 8186.49, 'text': 'um, we would like to play on one.', 'start': 8181.447, 'duration': 5.043}, {'end': 8197.115, 'text': "so yeah, the mcts played here, uh, and let's play, uh, i don't know, just on this edge right here, and the mcts played here, so negative one,", 'start': 8186.49, 'duration': 10.625}, {'end': 8198.812, 'text': 'and has won the game.', 'start': 8197.811, 'duration': 1.001}, {'end': 8202.294, 'text': 'So now we have our updated Monte Carlo tree search ready.', 'start': 8199.451, 'duration': 2.843}, {'end': 8207.877, 'text': 'And next we can actually start to build this main AlphaZero algorithm.', 'start': 8202.714, 'duration': 5.163}, {'end': 8212.48, 'text': 'So in order to do that, let us just create an AlphaZero class.', 'start': 8208.597, 'duration': 3.883}], 'summary': 'Mcts won with -1, now building alphazero algorithm.', 'duration': 31.033, 'max_score': 8181.447, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48181447.jpg'}, {'end': 8305.897, 'src': 'embed', 'start': 8277.714, 'weight': 0, 'content': [{'end': 8284.379, 'text': 'And when we initiate our multicolored research, we want to pass in the game, the arcs, and the model.', 'start': 8277.714, 'duration': 6.665}, {'end': 8288.583, 'text': "So let's just write game, arcs, model, just like this.", 'start': 8284.52, 'duration': 4.063}, {'end': 8294.107, 'text': 'So yeah, this is just the standard init of our AlphaZero class.', 'start': 8290.683, 'duration': 3.424}, {'end': 8298.331, 'text': 'And then next, we want to define the method inside.', 'start': 8294.888, 'duration': 3.443}, {'end': 8305.897, 'text': 'And from the overview, you might remember that we have these two components, which is the self-play part and then the training part.', 'start': 8298.991, 'duration': 6.906}], 'summary': 'Initiating multicolored research with game, arcs, and model. defining self-play and training parts.', 'duration': 28.183, 'max_score': 8277.714, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48277714.jpg'}, {'end': 8383.829, 'src': 'embed', 'start': 8354.77, 'weight': 3, 'content': [{'end': 8359.731, 'text': "So let's say for iteration in self.args of numIterations.", 'start': 8354.77, 'duration': 4.961}, {'end': 8368.983, 'text': 'And then for each iteration, we want to create this memory class.', 'start': 8364.281, 'duration': 4.702}, {'end': 8373.044, 'text': 'So just the training data essentially for one cycle.', 'start': 8369.022, 'duration': 4.022}, {'end': 8376.566, 'text': "So let's write our memory list here.", 'start': 8373.343, 'duration': 3.223}, {'end': 8378.607, 'text': "So let's define memory right here.", 'start': 8377.066, 'duration': 1.541}, {'end': 8383.829, 'text': 'And then we want to loop over all of our self play games.', 'start': 8379.787, 'duration': 4.042}], 'summary': 'Loop through numiterations to create memory for training data and self play games.', 'duration': 29.059, 'max_score': 8354.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48354770.jpg'}, {'end': 8497.056, 'src': 'embed', 'start': 8431.723, 'weight': 1, 'content': [{'end': 8441.928, 'text': "So let's write memory plus equals self.selfplay, just like this, so that we extend the new memory.", 'start': 8431.723, 'duration': 10.205}, {'end': 8452.954, 'text': "And actually we'd like to change the mode of our model so that the model is now in eval mode, so that we don't have these batch norms,", 'start': 8442.992, 'duration': 9.962}, {'end': 8454.494, 'text': 'for example during self-play.', 'start': 8452.954, 'duration': 1.54}, {'end': 8458.535, 'text': 'And now we want to move on to the training part.', 'start': 8456.195, 'duration': 2.34}, {'end': 8462.096, 'text': 'So first of all, we can write self.model.train like this.', 'start': 8458.595, 'duration': 3.501}, {'end': 8471.858, 'text': 'And then we can say for epoch in range of self.args num epochs.', 'start': 8464.396, 'duration': 7.462}, {'end': 8478.307, 'text': 'And yeah, for each epochs, we like to call this train method right here.', 'start': 8474.784, 'duration': 3.523}, {'end': 8481.589, 'text': 'And we like to give the memory as input.', 'start': 8478.647, 'duration': 2.942}, {'end': 8484.992, 'text': "So let's write save.train of memory like this.", 'start': 8481.629, 'duration': 3.363}, {'end': 8489.035, 'text': "So let's also add memory here as an argument.", 'start': 8485.872, 'duration': 3.163}, {'end': 8491.772, 'text': "So that's great.", 'start': 8491.152, 'duration': 0.62}, {'end': 8497.056, 'text': 'And at the end of an iteration, we like to store the weights of our model.', 'start': 8492.173, 'duration': 4.883}], 'summary': 'Update memory, switch model mode to eval, train model for num epochs, and store model weights.', 'duration': 65.333, 'max_score': 8431.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48431723.jpg'}], 'start': 7658.365, 'title': 'Mcts and alphazero optimization', 'summary': 'Discusses optimizing monte carlo tree search using the nograd decorator and masking out illegal states, and updating mcts for alphazero by rescaling policy, expanding in all directions, and creating the alphazero class for continuous learning.', 'chapters': [{'end': 7707.159, 'start': 7658.365, 'title': 'Monte carlo tree search optimization', 'summary': 'Discusses the optimization of monte carlo tree search using the nograd decorator for faster performance and the masking out of illegal states and moves to improve the efficiency of the search. it also covers the process of obtaining valid moves and applying them to the state of the leaf node.', 'duration': 48.794, 'highlights': ['The use of the noGrad decorator has been implemented to speed up the Monte Carlo tree search, resulting in improved efficiency and faster runtime.', 'Masking out illegal states and moves has been emphasized as a crucial step to prevent expansion in directions where a player has already played, contributing to the overall optimization of the search process.', 'The process of obtaining valid moves and applying them to the state of the leaf node has been explained as a key aspect of ensuring the search algorithm operates effectively and accurately.']}, {'end': 8540.263, 'start': 7709.02, 'title': 'Updating mcts and alphazero', 'summary': 'Describes updating the mcts algorithm for alphazero, including rescaling policy for percentages, getting float value from a neuron, expanding in all possible directions, and updating ucb formula, and then creating the alphazero class with self-play, training, and learn methods for continuous learning.', 'duration': 831.243, 'highlights': ['Rescaling policy to have percentages by dividing with its own sum, resulting in a sum of policy equal to one.', 'Getting float value from a neuron of the value head using the dot item method in PyTorch.', 'Expanding in all possible directions immediately after reaching the leaf node, leading to the removal of the simulation method and expandable moves.', 'Updating the UCB formula by removing math.log, adding one to the visit count of the child, and multiplying everything with the prior of the child for selection downwards the tree.', 'Creating the AlphaZero class with the self-play, training, and learn methods for continuous learning, involving looping over iterations, creating memory class, changing model mode to eval during self-play, training the model, and storing weights and state dict of the model and optimizer.']}], 'duration': 881.898, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB47658365.jpg', 'highlights': ['The AlphaZero class is created for continuous learning, involving self-play, training, and learning methods.', 'The use of the noGrad decorator has been implemented to speed up the Monte Carlo tree search, resulting in improved efficiency and faster runtime.', 'Masking out illegal states and moves has been emphasized as a crucial step to prevent expansion in directions where a player has already played, contributing to the overall optimization of the search process.', 'Rescaling policy to have percentages by dividing with its own sum, resulting in a sum of policy equal to one.', 'Updating the UCB formula by removing math.log, adding one to the visit count of the child, and multiplying everything with the prior of the child for selection downwards the tree.']}, {'end': 9431.347, 'segs': [{'end': 8653.203, 'src': 'embed', 'start': 8630.168, 'weight': 3, 'content': [{'end': 8639.794, 'text': 'And if indeed the game has ended already, then we want to first of all return all of this data to the memory here.', 'start': 8630.168, 'duration': 9.626}, {'end': 8648.44, 'text': 'And the data should be structured in this tuple form, where for each instance, we have a given state,', 'start': 8640.115, 'duration': 8.325}, {'end': 8653.203, 'text': 'we have the action probabilities of our MCTS and then we have the final outcome.', 'start': 8648.44, 'duration': 4.763}], 'summary': 'Return game data in tuple form with state, action probabilities, and final outcome.', 'duration': 23.035, 'max_score': 8630.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48630168.jpg'}, {'end': 9232.097, 'src': 'heatmap', 'start': 9069.489, 'weight': 0.802, 'content': [{'end': 9077.151, 'text': "So let's write save.game.getOpponentValue of value like this.", 'start': 9069.489, 'duration': 7.662}, {'end': 9083.632, 'text': 'So this is just more general if we call this method right here.', 'start': 9077.211, 'duration': 6.421}, {'end': 9085.933, 'text': 'So getOpponentValue.', 'start': 9084.593, 'duration': 1.34}, {'end': 9097.517, 'text': 'and the next thing would be that we want to visualize these loops right here, and the way we can do this is by having these progress bars,', 'start': 9087.211, 'duration': 10.306}, {'end': 9103.14, 'text': 'and we can get them by using the tqdm package.', 'start': 9097.517, 'duration': 5.623}, {'end': 9113.282, 'text': "so let's write import tqdm.notebook And you know what from tqdm.notebook.", 'start': 9103.14, 'duration': 10.142}, {'end': 9116.183, 'text': 'we want to import a T range like this', 'start': 9113.282, 'duration': 2.901}, {'end': 9125.869, 'text': 'So then we can just replace these range calls below with a T range.', 'start': 9117.684, 'duration': 8.185}, {'end': 9129.131, 'text': 'So just a small difference right here.', 'start': 9126.21, 'duration': 2.921}, {'end': 9137.036, 'text': 'And now we want to check that this ifr0 implementation is working as we have built it so far.', 'start': 9130.092, 'duration': 6.944}, {'end': 9143.44, 'text': 'So, We want to create an instance of AlphaZero.', 'start': 9138.697, 'duration': 4.743}, {'end': 9148.703, 'text': 'And in order to do that, we need a model, an optimizer, a game, and some arguments.', 'start': 9143.7, 'duration': 5.003}, {'end': 9153.666, 'text': 'So we get the model, first of all, by building a ResNet.', 'start': 9150.424, 'duration': 3.242}, {'end': 9157.449, 'text': "But initially, let's just create an instance of tic-tac-toe.", 'start': 9153.686, 'duration': 3.763}, {'end': 9161.652, 'text': "Let's write tic-tac-toe equals tic-tac-toe like this.", 'start': 9158.33, 'duration': 3.322}, {'end': 9166.115, 'text': 'And then we can write model equals ResNet.', 'start': 9163.373, 'duration': 2.742}, {'end': 9169.437, 'text': 'And for the game, we want to use this tic-tac-toe.', 'start': 9166.935, 'duration': 2.502}, {'end': 9171.439, 'text': "instance we've just created.", 'start': 9170.239, 'duration': 1.2}, {'end': 9175.601, 'text': 'And for the number of res blocks, we can just say four.', 'start': 9172.78, 'duration': 2.821}, {'end': 9180.102, 'text': 'And for the hidden dim, we can just say 64.', 'start': 9175.621, 'duration': 4.481}, {'end': 9187.905, 'text': "And now we want to define an optimizer, right? So let's use the Adam optimizer as a built-in PyTorch.", 'start': 9180.102, 'duration': 7.803}, {'end': 9192.527, 'text': "So let's write optimizer equals torch.optim.adam.", 'start': 9188.485, 'duration': 4.042}, {'end': 9198.969, 'text': 'And then for the parameters, we want to use the parameters of our models, just model.parameters.', 'start': 9193.787, 'duration': 5.182}, {'end': 9207.191, 'text': 'And for the learning rate, we can just use 0.001, just standard learning rate for now.', 'start': 9201.288, 'duration': 5.903}, {'end': 9214.295, 'text': 'And next, we also need to define these arguments right here.', 'start': 9208.252, 'duration': 6.043}, {'end': 9219.938, 'text': 'And we can do this by creating just another dictionary.', 'start': 9215.676, 'duration': 4.262}, {'end': 9225.901, 'text': 'And for the exploration content, we can just choose two again.', 'start': 9221.299, 'duration': 4.602}, {'end': 9232.097, 'text': 'For the number of searches, we can choose 60.', 'start': 9226.461, 'duration': 5.636}], 'summary': 'Creating an instance of alphazero using resnet and tic-tac-toe game with specific parameters and an adam optimizer.', 'duration': 162.608, 'max_score': 9069.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB49069489.jpg'}, {'end': 9219.938, 'src': 'embed', 'start': 9188.485, 'weight': 1, 'content': [{'end': 9192.527, 'text': "So let's write optimizer equals torch.optim.adam.", 'start': 9188.485, 'duration': 4.042}, {'end': 9198.969, 'text': 'And then for the parameters, we want to use the parameters of our models, just model.parameters.', 'start': 9193.787, 'duration': 5.182}, {'end': 9207.191, 'text': 'And for the learning rate, we can just use 0.001, just standard learning rate for now.', 'start': 9201.288, 'duration': 5.903}, {'end': 9214.295, 'text': 'And next, we also need to define these arguments right here.', 'start': 9208.252, 'duration': 6.043}, {'end': 9219.938, 'text': 'And we can do this by creating just another dictionary.', 'start': 9215.676, 'duration': 4.262}], 'summary': 'Using torch.optim.adam for optimization with model parameters and learning rate of 0.001.', 'duration': 31.453, 'max_score': 9188.485, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB49188485.jpg'}, {'end': 9431.347, 'src': 'embed', 'start': 9365.367, 'weight': 0, 'content': [{'end': 9377.914, 'text': 'So that we basically have a batch index and for each batch index we can then sample a whole batch of different samples and then use these for training.', 'start': 9365.367, 'duration': 12.547}, {'end': 9381.156, 'text': 'So the way we do this is by writing for batch.', 'start': 9378.374, 'duration': 2.782}, {'end': 9384.628, 'text': 'index in range.', 'start': 9382.526, 'duration': 2.102}, {'end': 9386.871, 'text': 'And here we want to start at zero.', 'start': 9385.449, 'duration': 1.422}, {'end': 9389.033, 'text': 'We want to end at the length of our memory.', 'start': 9386.991, 'duration': 2.042}, {'end': 9394.058, 'text': 'And then we want to step in the size of self.args.BatchSize.', 'start': 9390.134, 'duration': 3.924}, {'end': 9411.014, 'text': 'And now we want to take a sample from our memory, right? And we can get this sample by calling the memory at the batch index at the beginning.', 'start': 9399.529, 'duration': 11.485}, {'end': 9419.798, 'text': 'And we want to call it until batch index plus self.args.batchSize.', 'start': 9412.174, 'duration': 7.624}, {'end': 9431.347, 'text': "But just remember that we don't want to call it on any batch index that might be higher than our len of the memory.", 'start': 9420.878, 'duration': 10.469}], 'summary': 'Using batch index, samples are selected for training in batches.', 'duration': 65.98, 'max_score': 9365.367, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB49365367.jpg'}], 'start': 8540.263, 'title': 'Implementing alphazero training', 'summary': 'Details implementing alphazero training through visualization, model instantiation, optimizer usage, and batch training, with a focus on progress bars, shuffling data, and looping over memory.', 'chapters': [{'end': 9085.933, 'start': 8540.263, 'title': 'Self-play method in alphazero', 'summary': 'Explains the self-play method in the alphazero implementation, detailing the process of generating training data through mcts search, action sampling, and outcome determination, with a focus on data structure and player outcomes.', 'duration': 545.67, 'highlights': ['The self-play method involves generating training data by calling MCTS search, sampling actions, and determining player outcomes for each game instance. The self-play method involves calling MCTS search, sampling actions using action probabilities, and determining player outcomes for each game instance.', 'The training data is structured in a tuple form, including the game state, action probabilities, and player outcome for each game instance. The training data is structured in a tuple form, including the game state, action probabilities, and player outcome for each game instance.', 'The player outcomes are determined based on whether the player won, lost, or drew the game, and the outcome data is appended to the memory for later use in training. The player outcomes are determined based on whether the player won, lost, or drew the game, and the outcome data is appended to the memory for later use in training.']}, {'end': 9431.347, 'start': 9087.211, 'title': 'Implementing alphazero training', 'summary': 'Explains how to visualize loops using progress bars, create an instance of alphazero with a model, optimizer, game, and arguments, and implement the training method by shuffling data and looping over memory in batches for training.', 'duration': 344.136, 'highlights': ['The chapter demonstrates the visualization of loops using progress bars from the tqdm package, allowing for a better understanding of the progress of operations.', 'The process of creating an instance of AlphaZero involves building a model, using a game instance, defining an optimizer, and setting various arguments, such as exploration constant, number of searches, self-play iterations, and epochs.', 'The implementation of the training method in AlphaZero includes shuffling training data to avoid repetitive batches and looping over all memory in batches, ensuring efficient and effective training of the model.']}], 'duration': 891.084, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB48540263.jpg', 'highlights': ['The implementation of the training method in AlphaZero includes shuffling training data to avoid repetitive batches and looping over all memory in batches, ensuring efficient and effective training of the model.', 'The chapter demonstrates the visualization of loops using progress bars from the tqdm package, allowing for a better understanding of the progress of operations.', 'The self-play method involves generating training data by calling MCTS search, sampling actions, and determining player outcomes for each game instance.', 'The training data is structured in a tuple form, including the game state, action probabilities, and player outcome for each game instance.', 'The player outcomes are determined based on whether the player won, lost, or drew the game, and the outcome data is appended to the memory for later use in training.']}, {'end': 11647.48, 'segs': [{'end': 9966.305, 'src': 'heatmap', 'start': 9817.709, 'weight': 1, 'content': [{'end': 9822.331, 'text': 'And I will now test this for the number of iterations we have here.', 'start': 9817.709, 'duration': 4.622}, {'end': 9828.354, 'text': 'And then afterwards, we can finally see how this model has learned.', 'start': 9823.432, 'duration': 4.922}, {'end': 9831.776, 'text': "And feel free to train as well, but you don't have to.", 'start': 9829.055, 'duration': 2.721}, {'end': 9834.357, 'text': 'And we will make it more efficient later on.', 'start': 9831.916, 'duration': 2.441}, {'end': 9837.279, 'text': 'So we can also just test it here on my machine.', 'start': 9834.417, 'duration': 2.862}, {'end': 9841.504, 'text': 'Okay, sorry, so I forgot to define a batch size right here.', 'start': 9838.462, 'duration': 3.042}, {'end': 9844.565, 'text': 'So let us just pick 64 as our default batch size.', 'start': 9841.584, 'duration': 2.981}, {'end': 9850.188, 'text': 'And then we can run the cell right here and we get these nice progress bars during training.', 'start': 9845.206, 'duration': 4.982}, {'end': 9858.193, 'text': 'And I have just trained the cell right here and it took roughly 15 minutes just using the CPU of my machine.', 'start': 9851.069, 'duration': 7.124}, {'end': 9863.055, 'text': 'So now we actually have trained for a total number of three iterations.', 'start': 9859.113, 'duration': 3.942}, {'end': 9870.418, 'text': 'And remember that we save our model at each iteration because of this expression right here.', 'start': 9863.775, 'duration': 6.643}, {'end': 9876.041, 'text': 'So now let us actually check what the neural network we have trained understands of the game.', 'start': 9871.059, 'duration': 4.982}, {'end': 9880.543, 'text': 'And we can do this by moving up to this cell right here.', 'start': 9876.921, 'duration': 3.622}, {'end': 9884.725, 'text': 'So this is just where we tested our randomly initiated model.', 'start': 9880.563, 'duration': 4.162}, {'end': 9890.047, 'text': 'So here, when we define our model, let us actually load the state dict.', 'start': 9885.485, 'duration': 4.562}, {'end': 9893.849, 'text': "So let's write model.loadStateDict.", 'start': 9891.348, 'duration': 2.501}, {'end': 9897.33, 'text': 'Inside, we can write torch.load.', 'start': 9895.429, 'duration': 1.901}, {'end': 9903.533, 'text': 'And for the path, we can write model and then two, since we want to check the last iteration.', 'start': 9898.271, 'duration': 5.262}, {'end': 9905.874, 'text': 'Then pt for the file ending.', 'start': 9904.253, 'duration': 1.621}, {'end': 9909.335, 'text': "And you know what? Let's also put our model in eval mode.", 'start': 9906.474, 'duration': 2.861}, {'end': 9910.816, 'text': "So let's just run this right here.", 'start': 9909.595, 'duration': 1.221}, {'end': 9919.147, 'text': 'And then for this state right here, we get this distribution of where to play.', 'start': 9912.601, 'duration': 6.546}, {'end': 9925.012, 'text': 'So our model tells us that either we should play in the middle right here as player positive one,', 'start': 9919.307, 'duration': 5.705}, {'end': 9929.355, 'text': 'or we should play on this field right here and the corner as player positive one.', 'start': 9925.012, 'duration': 4.343}, {'end': 9935.861, 'text': "And you know what? Let's also test it for this state right here.", 'start': 9930.556, 'duration': 5.305}, {'end': 9940.625, 'text': 'So two and four is player negative one.', 'start': 9937.042, 'duration': 3.583}, {'end': 9951.029, 'text': "And then let's play a positive one.", 'start': 9944.642, 'duration': 6.387}, {'end': 9954.533, 'text': 'We have played on the fields of eight of six.', 'start': 9951.93, 'duration': 2.603}, {'end': 9966.305, 'text': 'So first of all, now you can see that we have copied the sports date from the image right here.', 'start': 9960.719, 'duration': 5.586}], 'summary': 'Trained model for 3 iterations, took 15 mins on cpu, yielded specific game strategy.', 'duration': 148.596, 'max_score': 9817.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB49817709.jpg'}, {'end': 10389.555, 'src': 'embed', 'start': 10361.566, 'weight': 5, 'content': [{'end': 10368.534, 'text': 'Then we want to power them by nine divided by self.args.temperature.', 'start': 10361.566, 'duration': 6.968}, {'end': 10371.979, 'text': 'So like this.', 'start': 10371.478, 'duration': 0.501}, {'end': 10377.05, 'text': 'And now we also have to define temperature in our arcs right here.', 'start': 10373.408, 'duration': 3.642}, {'end': 10382.952, 'text': "So let's write temperature and set that to 1.25 in this case.", 'start': 10377.71, 'duration': 5.242}, {'end': 10385.673, 'text': 'So now we have declared this temperature right here.', 'start': 10383.372, 'duration': 2.301}, {'end': 10389.555, 'text': 'And actually what this does.', 'start': 10386.513, 'duration': 3.042}], 'summary': 'Defining temperature as 1.25 to power calculation', 'duration': 27.989, 'max_score': 10361.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB410361566.jpg'}, {'end': 11159.255, 'src': 'heatmap', 'start': 11003.559, 'weight': 0.756, 'content': [{'end': 11014.747, 'text': 'And if this visit count of our parent is zero, then basically this means that All of this will basically move away,', 'start': 11003.559, 'duration': 11.188}, {'end': 11016.669, 'text': 'because we just turned to zero as well.', 'start': 11014.747, 'duration': 1.922}, {'end': 11026.198, 'text': "So this also means that we don't use our prior when we select a child for the first time.", 'start': 11017.71, 'duration': 8.488}, {'end': 11033.826, 'text': 'So actually, in order to make this better, we should just set the visit count of our root node to one at the beginning.', 'start': 11027.239, 'duration': 6.587}, {'end': 11037.828, 'text': "So you know what, let's write visit count.", 'start': 11034.666, 'duration': 3.162}, {'end': 11040.45, 'text': "Let's set that one right here.", 'start': 11038.969, 'duration': 1.481}, {'end': 11044.513, 'text': 'And this also means that we have to add visit count right here.', 'start': 11040.57, 'duration': 3.943}, {'end': 11046.515, 'text': 'And the default, it should just be zero.', 'start': 11044.533, 'duration': 1.982}, {'end': 11051.138, 'text': 'but yeah, for the root node, it should be one at the beginning,', 'start': 11046.515, 'duration': 4.623}, {'end': 11062.326, 'text': 'so that we immediately use the information of our prior when we select a child at the beginning of our multicolored research.', 'start': 11051.138, 'duration': 11.188}, {'end': 11065.204, 'text': 'okay, great.', 'start': 11063.843, 'duration': 1.361}, {'end': 11074.093, 'text': 'so now we can run all of these sets again and then see if this is working.', 'start': 11065.204, 'duration': 8.889}, {'end': 11092.89, 'text': 'so we also have to add Dirichlet epsilon here, and yeah, this should just be equal to 0.25 and then Dirichlet alpha, and I just set that to 0.3,', 'start': 11074.093, 'duration': 18.797}, {'end': 11094.39, 'text': 'like I told you.', 'start': 11092.89, 'duration': 1.5}, {'end': 11096.991, 'text': "Okay, so now let's test it out.", 'start': 11095.271, 'duration': 1.72}, {'end': 11105.295, 'text': 'Okay, so we still have a CPU device somewhere.', 'start': 11102.514, 'duration': 2.781}, {'end': 11109.977, 'text': "And let's see where that could be.", 'start': 11107.955, 'duration': 2.022}, {'end': 11112.999, 'text': "Oh, I haven't run this right here.", 'start': 11111.098, 'duration': 1.901}, {'end': 11114.1, 'text': 'This is the error, sorry.', 'start': 11113.019, 'duration': 1.081}, {'end': 11118.744, 'text': 'So here we have to also add a device.', 'start': 11114.881, 'duration': 3.863}, {'end': 11125.489, 'text': "And for this cell right here, you know what, let's just say torch.device of CPU.", 'start': 11119.304, 'duration': 6.185}, {'end': 11128.352, 'text': 'And then we can run this again.', 'start': 11125.509, 'duration': 2.843}, {'end': 11132.069, 'text': 'Okay, so perfect, now this is running.', 'start': 11130.307, 'duration': 1.762}, {'end': 11143.163, 'text': "And yeah, for this current setup with tic-tac-toe it shouldn't be that much faster, but later on, when we use more complex games,", 'start': 11133.011, 'duration': 10.152}, {'end': 11145.006, 'text': 'GPU support will be much nicer.', 'start': 11143.163, 'duration': 1.843}, {'end': 11147.349, 'text': 'And also this.', 'start': 11146.267, 'duration': 1.082}, {'end': 11159.255, 'text': 'These other tweaks should help to make our model more flexible and actually make it easier to always get to a perfect model at the end.', 'start': 11148.81, 'duration': 10.445}], 'summary': 'Updating visit count and adding device for improved model performance.', 'duration': 155.696, 'max_score': 11003.559, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411003559.jpg'}, {'end': 11074.093, 'src': 'embed', 'start': 11046.515, 'weight': 4, 'content': [{'end': 11051.138, 'text': 'but yeah, for the root node, it should be one at the beginning,', 'start': 11046.515, 'duration': 4.623}, {'end': 11062.326, 'text': 'so that we immediately use the information of our prior when we select a child at the beginning of our multicolored research.', 'start': 11051.138, 'duration': 11.188}, {'end': 11065.204, 'text': 'okay, great.', 'start': 11063.843, 'duration': 1.361}, {'end': 11074.093, 'text': 'so now we can run all of these sets again and then see if this is working.', 'start': 11065.204, 'duration': 8.889}], 'summary': 'Setting the root node as one at the beginning, to utilize prior information in multicolored research.', 'duration': 27.578, 'max_score': 11046.515, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411046515.jpg'}, {'end': 11289.728, 'src': 'embed', 'start': 11258.364, 'weight': 0, 'content': [{'end': 11262.287, 'text': 'We see that we get this nice distribution and this nice value again.', 'start': 11258.364, 'duration': 3.923}, {'end': 11265.25, 'text': 'We can also see that our model has learned.', 'start': 11263.028, 'duration': 2.222}, {'end': 11271.235, 'text': "By the way, here we aren't even masking out our policy.", 'start': 11265.99, 'duration': 5.245}, {'end': 11279, 'text': "So the model has learned itself that these moves aren't that great right here.", 'start': 11272.195, 'duration': 6.805}, {'end': 11288.868, 'text': "So you can't even play here because someone else has played without us even masking these illegal moves out.", 'start': 11280.121, 'duration': 8.747}, {'end': 11289.728, 'text': "So that's nice.", 'start': 11289.068, 'duration': 0.66}], 'summary': 'The model has learned to avoid illegal moves, achieving a nice distribution and value.', 'duration': 31.364, 'max_score': 11258.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411258364.jpg'}, {'end': 11509.787, 'src': 'embed', 'start': 11454.537, 'weight': 1, 'content': [{'end': 11462.542, 'text': 'Then we want to look at all of the fields inside of this column that are equal to zero, so that are empty currently.', 'start': 11454.537, 'duration': 8.005}, {'end': 11472.849, 'text': 'Then we want to basically take the deepest empty field inside of this column, because this is what we would play in Connect for,', 'start': 11463.443, 'duration': 9.406}, {'end': 11477.292, 'text': 'and this would just then be the row we use for playing.', 'start': 11472.849, 'duration': 4.443}, {'end': 11481.235, 'text': 'We get this by calling np.max,', 'start': 11478.173, 'duration': 3.062}, {'end': 11493.083, 'text': 'since in a numper array you start at zero at the top and then your values get higher if you move down basically on the rows.', 'start': 11481.235, 'duration': 11.848}, {'end': 11498.887, 'text': 'We take np.max of np.where of the state at the given column?', 'start': 11493.363, 'duration': 5.524}, {'end': 11504.526, 'text': 'Yeah, and the state here should equal to zero, right?', 'start': 11500.205, 'duration': 4.321}, {'end': 11509.787, 'text': 'So yeah, np.where will just again give us the indices of all of the empty fields.', 'start': 11504.986, 'duration': 4.801}], 'summary': 'Identify and select the deepest empty field in a column with np.max and np.where, where the state equals zero.', 'duration': 55.25, 'max_score': 11454.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411454537.jpg'}, {'end': 11571.901, 'src': 'embed', 'start': 11539.491, 'weight': 2, 'content': [{'end': 11545.737, 'text': 'we have given here as input and yeah, so now this is great.', 'start': 11539.491, 'duration': 6.246}, {'end': 11549.12, 'text': 'and next we want to move over to this method right here.', 'start': 11545.737, 'duration': 3.383}, {'end': 11564.795, 'text': 'so we have this these get valid moves right here and for the get valid moves we just want to check the row just at the top and then we want to see whether the fields at this row are equal to zero or not.', 'start': 11549.12, 'duration': 15.675}, {'end': 11567.818, 'text': 'and these are our value moves.', 'start': 11564.795, 'duration': 3.023}, {'end': 11571.901, 'text': 'so we can just write state of zero.', 'start': 11567.818, 'duration': 4.083}], 'summary': 'Method to check for valid moves in a row to write state of zero', 'duration': 32.41, 'max_score': 11539.491, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411539491.jpg'}], 'start': 9432.088, 'title': 'Model training and optimization for alphazero', 'summary': 'Delves into optimizing loss function for model training, training the alphazero model, implementing gpu support and tweaks, monte carlo tree search algorithm, and model training for tic-tac-toe and expansion to connect four, with practical examples and quantifiable data.', 'chapters': [{'end': 9784.682, 'start': 9432.088, 'title': 'Optimizing loss function for model training', 'summary': 'Discusses the process of preparing and optimizing tensors, applying them to a model, defining loss functions for policy and value, and minimizing the loss through backpropagation using pytorch, with practical examples and explanations.', 'duration': 352.594, 'highlights': ['Preparing and transforming data into tensors for model input The process involves converting the states, policy targets, and value targets into numpy arrays and subsequently transforming them into tensors using PyTorch, ensuring the appropriate data type and structure for efficient model input.', 'Defining and utilizing loss functions for policy and value The chapter explains the use of multi-target cross entropy loss for policy and mean squared error loss for value, essential in quantifying the deviation between predicted values and actual targets.', 'Minimizing loss through backpropagation The process involves minimizing the cumulative loss by backpropagating and optimizing the model parameters using the defined loss functions, enabling the model to iteratively improve its predictions.']}, {'end': 10039.719, 'start': 9787.492, 'title': 'Training alphazero model', 'summary': "Describes the training process for an alphazero model, including optimizing the model, defining batch size, training iterations, and evaluating the neural network's understanding of the game, achieving a 15-minute training time using the cpu for three iterations, and demonstrating the neural network's ability to make strategic game decisions.", 'duration': 252.227, 'highlights': ['The neural network was trained for a total of three iterations, achieving a training time of roughly 15 minutes using the CPU of the machine.', 'The neural network correctly identified strategic game moves, providing a distribution of where to play and evaluating the potential outcome of each move.', 'The process involved optimizing the model using backpropagation and self-play, with a specific focus on training efficiency and the ability to make strategic game decisions.']}, {'end': 10531.634, 'start': 10040.459, 'title': 'Implementing gpu support and adding tweaks', 'summary': 'Discusses implementing gpu support for faster training, adding weight decay of 0.001, and introducing a temperature parameter for flexibility in action sampling. it also covers adding noise to the policy for root node exploration in alphazero.', 'duration': 491.175, 'highlights': ['Implementing GPU support by declaring a device, setting it to CUDA for NVIDIA GPU if available, and falling back to CPU if not, resulting in faster training.', 'Adding weight decay of 0.001 to incorporate L2 regularization in the loss for AlphaZero, enhancing model performance.', 'Introducing a temperature parameter for action sampling flexibility, allowing for exploration or exploitation based on the temperature value.', 'Adding noise to the policy for the root node in Monte Carlo tree search to facilitate exploration and avoid missing promising actions.']}, {'end': 11074.093, 'start': 10531.634, 'title': 'Monte carlo tree search algorithm', 'summary': 'Explains the implementation of the monte carlo tree search algorithm, including adding policy, value, noise, and visit count to the root node, with the goal of enhancing exploration during the search process.', 'duration': 542.459, 'highlights': ['The policy and value are obtained by calling self.model and torch.tensor, and then adjusting the policy using softmax to incorporate randomness and exploration, aiming to improve the decision-making process during the Monte Carlo tree search.', 'The addition of noise to the policy involves multiplying the old policy with a coefficient smaller than one and then adding random noise generated using np.random.dirichlet, resulting in a modified policy that encourages exploration and randomness during the search process.', 'Setting the visit count of the root node to one at the beginning enables the immediate utilization of prior information when selecting a child during the Monte Carlo tree search, thus improving the decision-making process and exploration at the initial stages.']}, {'end': 11647.48, 'start': 11074.093, 'title': 'Model training and expansion to connect four', 'summary': 'Covers model training for tic-tac-toe and the expansion to connect four, including setting device, model learning, and game representation.', 'duration': 573.387, 'highlights': ['The chapter includes setting the device for model training, using torch.device to define device as CPU or CUDA based on availability, and updating model and tensor state to utilize the defined device.', "The model is trained for tic-tac-toe and weights are uploaded for accessibility, with the plan to expand the agent's capability to play Connect Four as well.", 'The game of Connect Four is defined, specifying parameters such as row count, column count, action size, and the number of stones needed in a row to win the game.', 'The method for obtaining valid moves and checking for a win in Connect Four is explained, involving checking rows and columns for valid moves and examining various win possibilities including vertical, horizontal, and diagonal.', 'The representation method for tic-tac-toe is added to facilitate model saving and differentiation between games during training, and the Connect Four game representation is defined with its initial state.']}], 'duration': 2215.392, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB49432088.jpg', 'highlights': ['Implementing GPU support for faster training by declaring a device and setting it to CUDA if available, falling back to CPU if not.', 'Adding weight decay of 0.001 to incorporate L2 regularization in the loss for AlphaZero, enhancing model performance.', 'Introducing a temperature parameter for action sampling flexibility, allowing for exploration or exploitation based on the temperature value.', 'Adding noise to the policy for the root node in Monte Carlo tree search to facilitate exploration and avoid missing promising actions.', 'The neural network was trained for a total of three iterations, achieving a training time of roughly 15 minutes using the CPU of the machine.', "The model is trained for tic-tac-toe and weights are uploaded for accessibility, with the plan to expand the agent's capability to play Connect Four as well."]}, {'end': 12264.08, 'segs': [{'end': 11712.845, 'src': 'embed', 'start': 11672.544, 'weight': 1, 'content': [{'end': 11675.466, 'text': 'So we can do this by making our code more general.', 'start': 11672.544, 'duration': 2.922}, {'end': 11679.648, 'text': 'And here we just replace tic-tac-toe with game.', 'start': 11676.387, 'duration': 3.261}, {'end': 11682.889, 'text': 'Then we use game inside here.', 'start': 11680.608, 'duration': 2.281}, {'end': 11686.489, 'text': 'And we use game inside here as well.', 'start': 11684.329, 'duration': 2.16}, {'end': 11689.17, 'text': 'Use game like this.', 'start': 11687.57, 'duration': 1.6}, {'end': 11691.991, 'text': 'Use game here.', 'start': 11690.97, 'duration': 1.021}, {'end': 11696.792, 'text': 'And also game right here.', 'start': 11695.531, 'duration': 1.261}, {'end': 11702.613, 'text': "So let's remove this instance of tic-tac-toe.", 'start': 11699.652, 'duration': 2.961}, {'end': 11706.282, 'text': 'And yeah, this one as well.', 'start': 11704.581, 'duration': 1.701}, {'end': 11712.845, 'text': 'And yeah, so this should be fine.', 'start': 11711.224, 'duration': 1.621}], 'summary': 'Generalize code by replacing tic-tac-toe with game, removing instances and ensuring functionality.', 'duration': 40.301, 'max_score': 11672.544, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411672544.jpg'}, {'end': 12053.787, 'src': 'embed', 'start': 12028.206, 'weight': 0, 'content': [{'end': 12032.97, 'text': 'And for the iteration, we will just use zero since we started zero and we have only trained for one iteration.', 'start': 12028.206, 'duration': 4.764}, {'end': 12038.034, 'text': 'And for the game, we will use connect four and for the file ending PT again.', 'start': 12033.81, 'duration': 4.224}, {'end': 12040.916, 'text': "And let's also define a map location.", 'start': 12038.054, 'duration': 2.862}, {'end': 12043.278, 'text': 'and set that equal to device.', 'start': 12041.916, 'duration': 1.362}, {'end': 12051.785, 'text': 'So this way, if your device would change during training compared to evaluation, you could still load the static like this right here.', 'start': 12043.898, 'duration': 7.887}, {'end': 12053.787, 'text': 'So yeah, this might be nice.', 'start': 12052.666, 'duration': 1.121}], 'summary': 'Trained for one iteration on connect four game, using file ending pt.', 'duration': 25.581, 'max_score': 12028.206, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412028206.jpg'}, {'end': 12275.634, 'src': 'embed', 'start': 12243.91, 'weight': 2, 'content': [{'end': 12249.333, 'text': 'And this way, we can drastically reduce the amount of times that we call our model in the first place.', 'start': 12243.91, 'duration': 5.423}, {'end': 12254.896, 'text': 'And yeah, thus, on one hand, we fully utilize our GPU capacities.', 'start': 12250.354, 'duration': 4.542}, {'end': 12264.08, 'text': 'And on the other hand, we can just decrease the amount of times we call our model and thus also have a higher speed right?', 'start': 12255.116, 'duration': 8.964}, {'end': 12275.634, 'text': 'So The way we can change this or update our AlphaZero implementation is that, first of all, we want to just copy this class here over.', 'start': 12265.48, 'duration': 10.154}], 'summary': 'Reducing model calls leads to higher gpu utilization and faster speed.', 'duration': 31.724, 'max_score': 12243.91, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412243910.jpg'}], 'start': 11648.201, 'title': 'Alphazero model for connect four', 'summary': 'Details the modification and testing of the alphazero model for connect four, covering changes to the code, training iterations, model parameters, and efficiency of training. it also discusses improving the implementation by evaluating the model, loading the state dict, and parallelizing the implementation for increased speed during self-play.', 'chapters': [{'end': 11976.104, 'start': 11648.201, 'title': 'Modifying and testing alphazero model for connect four', 'summary': 'Details the modification and testing of the alphazero model for the game of connect four, including changes to the code, training iterations, and model parameters, with a final note on the efficiency of training.', 'duration': 327.903, 'highlights': ["The model is modified for the game of Connect Four, with changes made to the code, including replacing instances of 'tic-tac-toe' with 'game' and altering model parameters such as the number of res blocks and hidden dimensions.", 'The training iterations and hyperparameters for the AlphaZero model are adjusted, with the number of searches increased to 600, the number of iterations set to eight, and the batch size raised to 128 for the game of Connect Four.', 'Efficiency concerns are addressed, with the recommendation to use pre-uploaded weights to avoid the time-consuming training process, as even one iteration took several hours to complete.']}, {'end': 12264.08, 'start': 11976.224, 'title': 'Improving alphazero implementation', 'summary': 'Discusses training alphazero model for connect four, evaluating the model, loading the state dict from saved weights, and parallelizing the alphazero implementation to increase speed during self-play.', 'duration': 287.856, 'highlights': ['The chapter discusses parallelizing the AlphaZero implementation to increase speed during self-play by batching up states and distributing policy and value predictions to play several self-play games in parallel, resulting in a drastic increase in speed.', "The chapter covers loading the state dict from the weights that were saved after training for one iteration, and evaluating the model's performance for connect four, demonstrating improvement compared to the initial model.", 'The chapter emphasizes the need to increase the efficiency of the AlphaZero implementation by parallelizing as much as possible, utilizing GPU capacities and reducing the number of times the model is called to achieve higher speed during self-play.']}], 'duration': 615.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB411648201.jpg', 'highlights': ['The training iterations and hyperparameters for the AlphaZero model are adjusted, with the number of searches increased to 600, the number of iterations set to eight, and the batch size raised to 128 for the game of Connect Four.', 'The chapter discusses parallelizing the AlphaZero implementation to increase speed during self-play by batching up states and distributing policy and value predictions to play several self-play games in parallel, resulting in a drastic increase in speed.', "The chapter covers loading the state dict from the weights that were saved after training for one iteration, and evaluating the model's performance for connect four, demonstrating improvement compared to the initial model.", 'Efficiency concerns are addressed, with the recommendation to use pre-uploaded weights to avoid the time-consuming training process, as even one iteration took several hours to complete.']}, {'end': 12888.287, 'segs': [{'end': 12437.496, 'src': 'embed', 'start': 12375.669, 'weight': 0, 'content': [{'end': 12379.631, 'text': 'And we also want to store node here, this should also be set to none at the beginning.', 'start': 12375.669, 'duration': 3.962}, {'end': 12384.394, 'text': 'And so this is everything we need here for this class right here.', 'start': 12380.672, 'duration': 3.722}, {'end': 12388.443, 'text': 'And yeah, so this way we can later store some information here.', 'start': 12385.581, 'duration': 2.862}, {'end': 12395.087, 'text': 'Okay, so now we want to update this self-play method right here.', 'start': 12389.603, 'duration': 5.484}, {'end': 12402.291, 'text': 'So, first of all, we want to call it less often, since every time we call this method,', 'start': 12396.147, 'duration': 6.144}, {'end': 12407.854, 'text': 'we will call it 400 potential games that are played in parallel, right?', 'start': 12402.291, 'duration': 5.563}, {'end': 12417.284, 'text': 'So here, when we loop over our numSelfPlayIterations, we want to divide that with the games who play parallel.', 'start': 12408.594, 'duration': 8.69}, {'end': 12425.093, 'text': 'So self.args of numParallel games like this.', 'start': 12418.565, 'duration': 6.528}, {'end': 12434.495, 'text': "And also when we create this SavePlayGame class right here, I've just forgot to add those brackets right here, so we should just add them here.", 'start': 12425.993, 'duration': 8.502}, {'end': 12437.496, 'text': 'So inside here.', 'start': 12435.836, 'duration': 1.66}], 'summary': 'Update self-play method to call 400 potential games, divide iterations by numparallel games, and fix syntax error in saveplaygame class.', 'duration': 61.827, 'max_score': 12375.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412375669.jpg'}, {'end': 12575.145, 'src': 'embed', 'start': 12546.455, 'weight': 5, 'content': [{'end': 12549.558, 'text': 'Now we have these self-play games right here.', 'start': 12546.455, 'duration': 3.103}, {'end': 12557.967, 'text': 'Next. we want to change this while true expression here, because This way we would stop after one game has finished,', 'start': 12549.998, 'duration': 7.969}, {'end': 12566.657, 'text': 'but rather we want to keep running our self-play method until all of our self-play games are finished.', 'start': 12557.967, 'duration': 8.69}, {'end': 12575.145, 'text': 'We can check this by saying while len is larger than zero.', 'start': 12566.777, 'duration': 8.368}], 'summary': 'Adjust while loop to run self-play method until all self-play games are finished.', 'duration': 28.69, 'max_score': 12546.455, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412546455.jpg'}, {'end': 12739.293, 'src': 'embed', 'start': 12677.165, 'weight': 6, 'content': [{'end': 12681.569, 'text': 'And yeah, here we can just do the same, right? So this is great.', 'start': 12677.165, 'duration': 4.404}, {'end': 12688.234, 'text': 'And now we can also pass these neutral states to our Monte Carlo tree search.', 'start': 12681.929, 'duration': 6.305}, {'end': 12691.516, 'text': "And we've basically finished this part right here.", 'start': 12689.435, 'duration': 2.081}, {'end': 12696.897, 'text': "So let's now change the Monte Carlo Tree Search search method.", 'start': 12691.636, 'duration': 5.261}, {'end': 12701.639, 'text': 'So first of all, we want to now give states here as input.', 'start': 12696.917, 'duration': 4.722}, {'end': 12704.2, 'text': "So these are the neutral states we've just created.", 'start': 12702.019, 'duration': 2.181}, {'end': 12710.753, 'text': 'And now we can also First of all, change the order.', 'start': 12705.52, 'duration': 5.233}, {'end': 12717.656, 'text': "Let's just paste this one here below.", 'start': 12711.953, 'duration': 5.703}, {'end': 12721.898, 'text': 'Then we can work on the policy and the value first.', 'start': 12719.357, 'duration': 2.541}, {'end': 12729.246, 'text': "Let's move it like this.", 'start': 12724.162, 'duration': 5.084}, {'end': 12731.047, 'text': 'First of all, we get this policy and this value.', 'start': 12729.266, 'duration': 1.781}, {'end': 12735.69, 'text': 'Like I said, we can also do this with all of our batched states.', 'start': 12731.087, 'duration': 4.603}, {'end': 12739.293, 'text': 'This way, we just get several policies and values back here as a return.', 'start': 12735.89, 'duration': 3.403}], 'summary': 'Implementing monte carlo tree search with neutral states and batched inputs.', 'duration': 62.128, 'max_score': 12677.165, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412677165.jpg'}, {'end': 12794.646, 'src': 'embed', 'start': 12766.712, 'weight': 1, 'content': [{'end': 12775.977, 'text': 'Because this getEncodedState method create these three planes for the fields in which player negative one played, or the fields that are empty,', 'start': 12766.712, 'duration': 9.265}, {'end': 12778.338, 'text': 'or the fields in which player positive one has played right?', 'start': 12775.977, 'duration': 2.361}, {'end': 12781.76, 'text': 'So it will always, first of all, give us these three planes.', 'start': 12778.859, 'duration': 2.901}, {'end': 12786.402, 'text': 'And then inside of these planes, it will give us the fields right?', 'start': 12781.78, 'duration': 4.622}, {'end': 12794.646, 'text': 'But we want to change the order here because the first axis, we basically want to have all of our different states.', 'start': 12786.642, 'duration': 8.004}], 'summary': 'The getencodedstate method creates three planes for player -1, empty, and player +1 fields, providing the fields within each plane.', 'duration': 27.934, 'max_score': 12766.712, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412766712.jpg'}], 'start': 12265.48, 'title': 'Updating alphazero implementation and storing self-play game information', 'summary': 'Discusses updating the alphazero implementation with parallel classes alphazeroparallel and mcts parallel, modifying safeplay and search methods, creating a class to store self-play game information, updating the self-play method, and modifying the monte carlo tree search search method.', 'chapters': [{'end': 12320.711, 'start': 12265.48, 'title': 'Updating alphazero implementation', 'summary': 'Discusses updating the alphazero implementation by creating parallel classes alphazeroparallel and mcts parallel, and modifying the safeplay and search methods.', 'duration': 55.231, 'highlights': ['Creating parallel classes AlphaZeroParallel and MCTS parallel to update the implementation.', 'Modifying the SafePlay method and the search method to accommodate the changes.', 'The need to update the AlphaZero and MCTS classes for the implementation changes.']}, {'end': 12888.287, 'start': 12321.892, 'title': 'Storing self-play game information', 'summary': 'Discusses creating a class to store information of self-play games, updating the self-play method, and modifying the monte carlo tree search search method.', 'duration': 566.395, 'highlights': ["The class 'SPG' is created to store information of self-play games, including the game state, memory, root, and node.", "The self-play method is updated to be called less often, with 400 potential games played in parallel and the 'numParallelGames' value added when creating 'SavePlayGame' class.", 'The while loop for the self-play method is modified to continue running until all self-play games are finished, and the states of the self-play games are obtained and turned into a NumPy array.', 'The perspective for all states is changed using a single function call for efficiency, and the neutral states are passed to the Monte Carlo tree search.', "The Monte Carlo Tree Search search method is modified to take neutral states as input and return policies and values for batched states, requiring updates to the 'unsqueeze' call and the 'getEncodedState' method to swap the axis."]}], 'duration': 622.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412265480.jpg', 'highlights': ['Creating parallel classes AlphaZeroParallel and MCTS parallel to update the implementation.', 'Modifying the SafePlay method and the search method to accommodate the changes.', 'The need to update the AlphaZero and MCTS classes for the implementation changes.', "The class 'SPG' is created to store information of self-play games, including the game state, memory, root, and node.", "The self-play method is updated to be called less often, with 400 potential games played in parallel and the 'numParallelGames' value added when creating 'SavePlayGame' class.", 'The while loop for the self-play method is modified to continue running until all self-play games are finished, and the states of the self-play games are obtained and turned into a NumPy array.', 'The perspective for all states is changed using a single function call for efficiency, and the neutral states are passed to the Monte Carlo tree search.', "The Monte Carlo Tree Search search method is modified to take neutral states as input and return policies and values for batched states, requiring updates to the 'unsqueeze' call and the 'getEncodedState' method to swap the axis."]}, {'end': 14872.747, 'segs': [{'end': 13013.488, 'src': 'embed', 'start': 12976.644, 'weight': 0, 'content': [{'end': 12981.089, 'text': 'And now we have this process policy ready for us.', 'start': 12976.644, 'duration': 4.445}, {'end': 12988.834, 'text': 'And the next step would be to allocate this policy to all of our self-play games.', 'start': 12982.07, 'duration': 6.764}, {'end': 12996.959, 'text': 'So we first of all have to also add the self-play games here to our MCTS search as the argument.', 'start': 12989.855, 'duration': 7.104}, {'end': 13002.083, 'text': "And let's also add them here because of that.", 'start': 12998.26, 'duration': 3.823}, {'end': 13006.105, 'text': 'and now we can work with them.', 'start': 13004.584, 'duration': 1.521}, {'end': 13013.488, 'text': 'Here, we basically want to loop over all of our self-play games.', 'start': 13008.366, 'duration': 5.122}], 'summary': 'Prepare process policy, allocate to self-play games, and integrate with mcts search.', 'duration': 36.844, 'max_score': 12976.644, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412976644.jpg'}, {'end': 13132.658, 'src': 'embed', 'start': 13107.272, 'weight': 4, 'content': [{'end': 13118.543, 'text': 'And then also we have to make sure that this root will be equal to sbg.root, right? Because we want to store it inside of this class right here.', 'start': 13107.272, 'duration': 11.271}, {'end': 13124.177, 'text': 'Okay, so now we have this sbg.root and we can also expand it.', 'start': 13120.996, 'duration': 3.181}, {'end': 13126.917, 'text': "So let's write sbg.root.expand like this.", 'start': 13124.197, 'duration': 2.72}, {'end': 13132.658, 'text': 'And then we will expand it with the SavePlayGamePolicy we have here.', 'start': 13127.837, 'duration': 4.821}], 'summary': 'Setting sbg.root equal to root, expanding sbg.root with saveplaygamepolicy.', 'duration': 25.386, 'max_score': 13107.272, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB413107272.jpg'}, {'end': 13894.488, 'src': 'embed', 'start': 13867.551, 'weight': 3, 'content': [{'end': 13879.479, 'text': 'because then we would basically not have a perfect alignment between this index here and the actual safe play game we want to reach if we mutate our list instead of our loop.', 'start': 13867.551, 'duration': 11.928}, {'end': 13885.302, 'text': 'So we can fix this by flipping this range here around.', 'start': 13880.059, 'duration': 5.243}, {'end': 13890.325, 'text': 'So now we want to loop over this flipped range of our safe play games.', 'start': 13886.303, 'duration': 4.022}, {'end': 13894.488, 'text': 'And then we can just tap all of these things in right here.', 'start': 13891.586, 'duration': 2.902}], 'summary': 'Flipping the range can align index with safe play game.', 'duration': 26.937, 'max_score': 13867.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB413867551.jpg'}, {'end': 14366.418, 'src': 'embed', 'start': 14339.424, 'weight': 1, 'content': [{'end': 14343.445, 'text': "So let's move on top of this position here.", 'start': 14339.424, 'duration': 4.021}, {'end': 14344.405, 'text': "So let's play three.", 'start': 14343.465, 'duration': 0.94}, {'end': 14347.706, 'text': 'So the model played here.', 'start': 14346.426, 'duration': 1.28}, {'end': 14351.747, 'text': 'We want to get some pressure here.', 'start': 14349.566, 'duration': 2.181}, {'end': 14354.027, 'text': "So maybe we'd like to play here.", 'start': 14351.787, 'duration': 2.24}, {'end': 14354.908, 'text': 'We also defend here.', 'start': 14354.047, 'duration': 0.861}, {'end': 14358.068, 'text': "So let's play on two, I guess.", 'start': 14354.948, 'duration': 3.12}, {'end': 14360.329, 'text': 'Oh, no.', 'start': 14360.129, 'duration': 0.2}, {'end': 14364.097, 'text': 'Then if we play here, our model will play here and we would be trapped.', 'start': 14360.796, 'duration': 3.301}, {'end': 14366.418, 'text': "So we can't play here as well anymore.", 'start': 14364.717, 'duration': 1.701}], 'summary': 'Discussion about gameplay strategy, focusing on positions and moves.', 'duration': 26.994, 'max_score': 14339.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB414339424.jpg'}, {'end': 14871.727, 'src': 'heatmap', 'start': 14718.443, 'weight': 0.802, 'content': [{'end': 14727.411, 'text': 'And then here for the players, we can just fill this list with first of all, stating player1.run and then player2.run like this.', 'start': 14718.443, 'duration': 8.968}, {'end': 14729.913, 'text': 'Okay, so this should be working now.', 'start': 14727.911, 'duration': 2.002}, {'end': 14733.836, 'text': "Let's just run the cell and get these nice visualizations.", 'start': 14729.953, 'duration': 3.883}, {'end': 14740.852, 'text': 'oh so now we get this neat animation of our models playing against each other.', 'start': 14736.027, 'duration': 4.825}, {'end': 14744.255, 'text': 'so we have two players, but just one model here playing against itself.', 'start': 14740.852, 'duration': 3.403}, {'end': 14748.479, 'text': 'basically, and i believe this should come to a draw.', 'start': 14744.255, 'duration': 4.224}, {'end': 14752.343, 'text': 'yeah, so our model is that advanced.', 'start': 14748.479, 'duration': 3.864}, {'end': 14756.027, 'text': 'i guess that it can defend against all attacks.', 'start': 14752.343, 'duration': 3.684}, {'end': 14759.43, 'text': 'so in this still just this nice animation.', 'start': 14756.027, 'duration': 3.403}, {'end': 14765.92, 'text': 'And now we can also just briefly do the same for tic-tac-toe as well.', 'start': 14760.452, 'duration': 5.468}, {'end': 14769.265, 'text': 'So here we just have to change some small things.', 'start': 14766.681, 'duration': 2.584}, {'end': 14774.633, 'text': 'So first of all, we have to set the game to tic-tac-toe.', 'start': 14769.826, 'duration': 4.807}, {'end': 14781.164, 'text': 'And then also, we can change our arguments.', 'start': 14777.583, 'duration': 3.581}, {'end': 14784.344, 'text': "So let's just set number of searches to 100.", 'start': 14781.184, 'duration': 3.16}, {'end': 14790.866, 'text': 'And here, for our resnet, we can set the number of res blocks to four and the hidden dim to 64, I guess.', 'start': 14784.344, 'duration': 6.522}, {'end': 14793.366, 'text': 'And then we want to update this path.', 'start': 14791.666, 'duration': 1.7}, {'end': 14798.968, 'text': "So I think the last tic-tac-toe model we trained was model2.pt, if I'm not mistaken.", 'start': 14793.506, 'duration': 5.462}, {'end': 14803.669, 'text': 'So then also here, we have to set this to tic-tac-toe as well.', 'start': 14799.048, 'duration': 4.621}, {'end': 14805.189, 'text': 'And then we can just run this.', 'start': 14804.249, 'duration': 0.94}, {'end': 14811.666, 'text': 'Okay, so now we get this nice animation immediately of our models playing against each other.', 'start': 14806.462, 'duration': 5.204}, {'end': 14817.731, 'text': 'And again, we have gotten a draw because the model is able to defend all possible attacks.', 'start': 14811.786, 'duration': 5.945}, {'end': 14820.093, 'text': 'So this is still very nice.', 'start': 14818.852, 'duration': 1.241}, {'end': 14824.296, 'text': 'So feel free to do some further experiments at your own.', 'start': 14820.673, 'duration': 3.623}, {'end': 14826.858, 'text': 'So maybe you could also set the search to false,', 'start': 14824.396, 'duration': 2.462}, {'end': 14833.684, 'text': 'so that just your neural networks directly will play against each other without doing these hundred searches in this case.', 'start': 14826.858, 'duration': 6.826}, {'end': 14837.005, 'text': "And yeah, so I think we're finished with this tutorial.", 'start': 14834.604, 'duration': 2.401}, {'end': 14838.066, 'text': 'This was a lot of fun.', 'start': 14837.065, 'duration': 1.001}, {'end': 14846.03, 'text': "And I've created this GitHub repository where there's a Jupyter Notebook stored for each checkpoint.", 'start': 14838.946, 'duration': 7.084}, {'end': 14854.134, 'text': 'And then also I have this weights folder where we have the last model for tic-tac-toe and the last model for Connect4 that we have trained.', 'start': 14846.07, 'duration': 8.064}, {'end': 14861.022, 'text': 'And if there are any questions, feel free to ask them either in the comments or just by sending an email, for example.', 'start': 14855.26, 'duration': 5.762}, {'end': 14867.325, 'text': "And yeah, I might do a follow-up video on Mu0 since I've also built that algorithm from scratch.", 'start': 14862.003, 'duration': 5.322}, {'end': 14871.727, 'text': "So if you're interested in that, it would be nice to let me know.", 'start': 14867.925, 'duration': 3.802}], 'summary': 'A tutorial demonstrating models playing connect4 and tic-tac-toe, resulting in draws, with suggested experiments and available resources.', 'duration': 153.284, 'max_score': 14718.443, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB414718443.jpg'}, {'end': 14871.727, 'src': 'embed', 'start': 14846.07, 'weight': 2, 'content': [{'end': 14854.134, 'text': 'And then also I have this weights folder where we have the last model for tic-tac-toe and the last model for Connect4 that we have trained.', 'start': 14846.07, 'duration': 8.064}, {'end': 14861.022, 'text': 'And if there are any questions, feel free to ask them either in the comments or just by sending an email, for example.', 'start': 14855.26, 'duration': 5.762}, {'end': 14867.325, 'text': "And yeah, I might do a follow-up video on Mu0 since I've also built that algorithm from scratch.", 'start': 14862.003, 'duration': 5.322}, {'end': 14871.727, 'text': "So if you're interested in that, it would be nice to let me know.", 'start': 14867.925, 'duration': 3.802}], 'summary': 'Weights folder contains last models for tic-tac-toe and connect4. potential follow-up on mu0 algorithm.', 'duration': 25.657, 'max_score': 14846.07, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB414846070.jpg'}], 'start': 12889.968, 'title': 'Refining mcts search method', 'summary': 'Introduces updates to the mcts search method, optimizing model calls for self-play games, implementing parallel multicolored research, and optimizing self-play game iteration, resulting in improved efficiency, reduced model calls, improved performance, and 5x speed improvement in training a connect4 model.', 'chapters': [{'end': 13132.658, 'start': 12889.968, 'title': 'Mcts search method update', 'summary': 'Introduces updates to the mcts search method, including renaming variables, adding noise to policies, and allocating policies to self-play games.', 'duration': 242.69, 'highlights': ['The chapter introduces updates to the MCTS search method, including renaming variables, adding noise to policies, and allocating policies to self-play games.', 'The method involves processing policies and values, obtaining valid moves, and multiplying the policy with them for each self-play game.', 'Adding noise to the policy requires ensuring the shape of the noise is equal to the shape of the policy, with the size being equal to policy.shape of 0.', 'The process involves looping over all self-play games, obtaining valid moves for each game, and multiplying the policy of the game with the valid moves.']}, {'end': 13549.521, 'start': 13133.618, 'title': 'Optimizing model calls for self-play games', 'summary': 'Discusses optimizing model calls for self-play games by batching states, parallelizing model calls, and identifying expandable games, resulting in improved efficiency and reduced model calls.', 'duration': 415.903, 'highlights': ['The chapter emphasizes the optimization of model calls by batching states, resulting in significant reduction in model calls for self-play games.', 'Parallelizing model calls is highlighted as a key strategy to improve efficiency in processing self-play games and reduce the number of model calls.', 'The process of identifying and handling expandable self-play games is explained, contributing to improved efficiency and reduced model calls.']}, {'end': 13830.952, 'start': 13550.202, 'title': 'Parallel implementation for multicolored research', 'summary': 'Discusses the implementation of a parallelized multicolored research code for a number of self-play games, enabling efficient allocation and processing of policies and values, as well as back propagation, resulting in improved performance and scalability.', 'duration': 280.75, 'highlights': ['Implementation of a parallelized multicolored research code for a number of self-play games The chapter focuses on implementing a parallelized multicolored research code for efficiently handling a number of self-play games.', 'Efficient allocation and processing of policies and values The implementation enables efficient allocation and processing of policies and values for improved performance.', 'Back propagation for improved performance and scalability The implementation includes back propagation to improve performance and scalability of the system.']}, {'end': 14085.472, 'start': 13832.273, 'title': 'Optimizing self-play game iteration', 'summary': 'Discusses optimizing the iteration over self-play games by flipping the range and updating the state, resulting in efficient removal of terminal self-play games and alignment between the index and the game being accessed.', 'duration': 253.199, 'highlights': ['Flipping the range while looping over self-play games allows for efficient removal of terminal games and ensures alignment between the index and the game being accessed.', 'Updating the state inside the self-play game class ensures that the state is correctly modified and updated.', 'Efficient removal of terminal self-play games from the list by deleting them at the correct position, resulting in a shortened self-play games list and maintaining loop functionality.']}, {'end': 14550.127, 'start': 14086.333, 'title': 'Improving connect4 model with parallelization', 'summary': "Details the process of training a connect4 model using parallelized f0 implementation, resulting in a 5x speed improvement, testing the model's performance, and making minor code adjustments, ultimately showcasing the model's ability to defeat the author with around 600 searches.", 'duration': 463.794, 'highlights': ["Training the model using parallelized F0 implementation resulted in a 5X speed improvement. The model's speed was improved by about five times using the parallelized F0 implementation.", 'The model, trained for eight iterations, demonstrated the ability to defeat the author with around 600 searches. Despite having around 600 searches, the model was able to defeat the author in the game of Connect Four.', "Minor code adjustments were made to optimize the model's performance, including utilizing temperature action props and optimizing the optimizer. Adjustments were made to utilize temperature action props and optimize the optimizer, enhancing the model's performance."]}, {'end': 14872.747, 'start': 14550.527, 'title': 'Implementing kaggle agents for connect4 and tic-tac-toe', 'summary': 'Demonstrates the implementation of kaggle agents for connect4 and tic-tac-toe, utilizing mcts merge search and neural network models, resulting in models capable of defending against all attacks and achieving draws in both games.', 'duration': 322.22, 'highlights': ['The chapter demonstrates the implementation of Kaggle agents for Connect4 and Tic-Tac-Toe The tutorial covers the implementation of Kaggle agents for both Connect4 and Tic-Tac-Toe games.', 'Models are capable of defending against all attacks and achieving draws in both games The neural network models, utilizing MCTS merge search, are shown to be able to defend against all possible attacks and achieve draws in both Connect4 and Tic-Tac-Toe games.', 'Tutorial includes the availability of Jupyter Notebook and models in a GitHub repository The tutorial offers access to Jupyter Notebooks and the last models for Tic-Tac-Toe and Connect4 in a GitHub repository for further experimentation.']}], 'duration': 1982.779, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wuSQpLinRB4/pics/wuSQpLinRB412889968.jpg', 'highlights': ['5X speed improvement in training a connect4 model using parallelized F0 implementation', 'Optimization of model calls by batching states, resulting in significant reduction in model calls for self-play games', 'Implementation of a parallelized multicolored research code for efficiently handling a number of self-play games', 'Efficient allocation and processing of policies and values for improved performance', 'Models capable of defending against all attacks and achieving draws in both Connect4 and Tic-Tac-Toe games']}], 'highlights': ['AlphaZero achieves magnificent performance in extremely complex board games such as Go, where the amount of legal board positions is significantly higher than the amount of atoms in our universe.', 'The machine learning system of AlphaZero learns all information just by playing with itself.', 'The algorithm can play chess and shogi in a very impressive way.', "The recent AlphaTensor paper by DeepMind showcased AlphaZero's capability to invent novel algorithms within mathematics.", 'The generated data from the self-play phase is utilized for training the AlphaZero model, highlighting its iterative learning process.', 'The AlphaZero model iteratively optimizes itself by playing with itself and using gained information, repeating the cycle n number of times to reach a neural network capable of outperforming humans.', 'The neural network architecture involves taking the game state as input and producing a policy and a value as output, where the policy determines the promising actions based on the state, and the value indicates the desirability of the state for the player.', 'The Monte Carlo Tree Search algorithm involves creating a tree structure where each node stores state, winning count (w), and visit count (n), enabling the determination of the most promising action based on win ratios and visit counts.', 'The selection phase involves walking down the tree by choosing the child with the highest UCB formula, maximizing winning ratio and minimizing visits.', 'The adaptation of Monte Carlo tree search to the general alpha zero algorithm, involving two key changes.', 'The incorporation of the policy gained from the model into the search process, leading to more frequent selection of children with high policy assignments.', "The value network is used for backpropagation, resulting in a drastic improvement in Monte Carlo tree search, and the model's ability to play the game and create a better model is emphasized.", 'The policy probabilities obtained from the neural network enable convenient expansion in all possible directions during the expansion phase, allowing for the creation of multiple nodes instead of just one.', 'The chapter covers the implementation of a working tic-tac-toe game, including methods for checking game termination, determining the opponent, and testing the game functionality.', 'The search method is structured into selection, expansion, simulation, and backpropagation phases, iterating over a specified number of searches, and returns the visit count distribution for the children of all root nodes.', "The implementation includes the concept of changing player perspectives, allowing for the perception of a different player while retaining the original player's perspective.", 'The AlphaZero class is created for continuous learning, involving self-play, training, and learning methods.', 'The implementation of the training method in AlphaZero includes shuffling training data to avoid repetitive batches and looping over all memory in batches, ensuring efficient and effective training of the model.', 'Implementing GPU support for faster training by declaring a device and setting it to CUDA if available, falling back to CPU if not.', 'The training iterations and hyperparameters for the AlphaZero model are adjusted, with the number of searches increased to 600, the number of iterations set to eight, and the batch size raised to 128 for the game of Connect Four.', 'The creation of parallel classes AlphaZeroParallel and MCTS parallel to update the implementation.', '5X speed improvement in training a connect4 model using parallelized F0 implementation', 'Optimization of model calls by batching states, resulting in significant reduction in model calls for self-play games', 'Implementation of a parallelized multicolored research code for efficiently handling a number of self-play games', 'Efficient allocation and processing of policies and values for improved performance', 'Models capable of defending against all attacks and achieving draws in both Connect4 and Tic-Tac-Toe games']}