title
Lecture 8 | Deep Learning Software
description
In Lecture 8 we discuss the use of different software packages for deep learning, focusing on TensorFlow and PyTorch. We also discuss some differences between CPUs and GPUs.
Keywords: CPU vs GPU, TensorFlow, Keras, Theano, Torch, PyTorch, Caffe, Caffe2, dynamic vs static computational graphs
Slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
--------------------------------------------------------------------------------------
Convolutional Neural Networks for Visual Recognition
Instructors:
Fei-Fei Li: http://vision.stanford.edu/feifeili/
Justin Johnson: http://cs.stanford.edu/people/jcjohns/
Serena Yeung: http://ai.stanford.edu/~syyeung/
Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This lecture collection is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. From this lecture collection, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision.
Website:
http://cs231n.stanford.edu/
For additional learning opportunities please visit:
http://online.stanford.edu/
detail
{'title': 'Lecture 8 | Deep Learning Software', 'heatmap': [{'end': 1926.836, 'start': 1873.999, 'weight': 0.713}, {'end': 2066.963, 'start': 1965.485, 'weight': 0.737}, {'end': 3381.594, 'start': 3181.399, 'weight': 0.77}], 'summary': "Lecture at stanford university covers deep learning software, emphasizing google cloud gpu instances, gpu acceleration benefits, deep learning frameworks shift to industry, tensorflow variable operations, higher-level libraries in tensorflow, pytorch training essentials, and tensorflow's challenges and comparisons with other deep learning frameworks.", 'chapters': [{'end': 68.161, 'segs': [{'end': 68.161, 'src': 'embed', 'start': 4.858, 'weight': 0, 'content': [{'end': 5.979, 'text': 'at Stanford University.', 'start': 4.858, 'duration': 1.121}, {'end': 12.884, 'text': "Hello? Okay, it's after 12, so I wanna get started.", 'start': 9.602, 'duration': 3.282}, {'end': 17.308, 'text': "So today, lecture eight, we're gonna talk about deep learning software.", 'start': 14.085, 'duration': 3.223}, {'end': 20.77, 'text': 'This is a super exciting topic because it changes a lot every year.', 'start': 17.968, 'duration': 2.802}, {'end': 24.593, 'text': "That also means it's a lot of work to give this lecture because it changes a lot every year.", 'start': 20.79, 'duration': 3.803}, {'end': 29.537, 'text': 'But as usual, a couple administrative notes before we dive into the material.', 'start': 25.774, 'duration': 3.763}, {'end': 34.341, 'text': 'So as a reminder, the project proposals for your course projects were due on Tuesday.', 'start': 30.077, 'duration': 4.264}, {'end': 42.166, 'text': 'So hopefully you all turned that in and hopefully you all have a somewhat good idea of what kind of projects you want to work on for the class.', 'start': 34.861, 'duration': 7.305}, {'end': 49.771, 'text': "So we're in the process of assigning TAs to projects based on what the project area is and the expertise of the TAs.", 'start': 43.167, 'duration': 6.604}, {'end': 53.613, 'text': "So we'll have some more information about that in the next couple days, I think.", 'start': 50.271, 'duration': 3.342}, {'end': 60.597, 'text': "We're also in the process of grading assignment one, so stay tuned and we'll get those grades back to you as soon as we can.", 'start': 54.854, 'duration': 5.743}, {'end': 64.679, 'text': 'Another reminder is that assignment two has been out for a while.', 'start': 61.758, 'duration': 2.921}, {'end': 68.161, 'text': "That's gonna be due next week, a week from today, Thursday.", 'start': 64.879, 'duration': 3.282}], 'summary': 'Lecture 8 at stanford discussed deep learning software and project deadlines.', 'duration': 63.303, 'max_score': 4.858, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4858.jpg'}], 'start': 4.858, 'title': 'Deep learning software lecture', 'summary': 'Covers the lecture on deep learning software at stanford university, with administrative notes on project proposals and assignment deadlines.', 'chapters': [{'end': 68.161, 'start': 4.858, 'title': 'Deep learning software lecture at stanford', 'summary': 'Covers the lecture on deep learning software at stanford university, including administrative notes on project proposals and assignment deadlines.', 'duration': 63.303, 'highlights': ['The lecture covers deep learning software, a topic that changes significantly every year.', 'Project proposals for course projects were due on Tuesday, and TAs are being assigned based on project areas and expertise.', 'Assignment two is due a week from today, next Thursday.']}], 'duration': 63.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4858.jpg', 'highlights': ['The lecture covers deep learning software, a topic that changes significantly every year.', 'Project proposals for course projects were due on Tuesday, and TAs are being assigned based on project areas and expertise.', 'Assignment two is due a week from today, next Thursday.']}, {'end': 696.094, 'segs': [{'end': 91.991, 'src': 'embed', 'start': 68.181, 'weight': 0, 'content': [{'end': 75.245, 'text': "And again, when working on assignment two, remember to stop your Google Cloud instances when you're not working to try to preserve your credits.", 'start': 68.181, 'duration': 7.064}, {'end': 84.588, 'text': 'And another bit of confusion I just wanted to reemphasize is that for assignment two, you really only need to use GPU instances for the last notebook.', 'start': 77.145, 'duration': 7.443}, {'end': 91.991, 'text': "For all the first several notebooks, it's just in Python and NumPy, so you don't need any GPUs for those questions.", 'start': 84.968, 'duration': 7.023}], 'summary': 'Use gpu instances only for the last notebook in assignment two to conserve credits.', 'duration': 23.81, 'max_score': 68.181, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc68181.jpg'}, {'end': 454.641, 'src': 'embed', 'start': 426.233, 'weight': 5, 'content': [{'end': 429.013, 'text': 'So CPUs tend to have just a few cores.', 'start': 426.233, 'duration': 2.78}, {'end': 436.135, 'text': 'For consumer desktop CPUs these days, they might have something like four or six or maybe up to 10 cores.', 'start': 429.794, 'duration': 6.341}, {'end': 444.698, 'text': 'with hyper-threading technology, that means they can run, the hardware can physically run like maybe eight or up to 20 threads concurrently.', 'start': 437.115, 'duration': 7.583}, {'end': 454.641, 'text': 'So the CPU can maybe do 20 things in parallel at once, which is not a gigantic number, but those threads for a CPU are pretty powerful.', 'start': 445.198, 'duration': 9.443}], 'summary': 'Consumer desktop cpus may have 4-10 cores with up to 20 threads, allowing for running 20 things in parallel.', 'duration': 28.408, 'max_score': 426.233, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc426233.jpg'}, {'end': 494.584, 'src': 'embed', 'start': 465.857, 'weight': 3, 'content': [{'end': 471.462, 'text': 'So for GPUs, we see that these sort of common top end consumer GPUs have thousands of cores.', 'start': 465.857, 'duration': 5.605}, {'end': 480.27, 'text': 'So the Nvidia Titan XP, which is the current top of the line consumer GPU, has 3,840 cores.', 'start': 472.003, 'duration': 8.267}, {'end': 482.072, 'text': "So that's a crazy number.", 'start': 480.731, 'duration': 1.341}, {'end': 485.875, 'text': "That's like way more than the 10 cores that you'll get for a similarly priced CPU.", 'start': 482.112, 'duration': 3.763}, {'end': 494.584, 'text': "The downside of a GPU is that each of those cores, one, it runs at a much lower clock speed, and two, they really can't do quite as much.", 'start': 486.836, 'duration': 7.748}], 'summary': 'Nvidia titan xp has 3,840 cores, significantly more than cpus.', 'duration': 28.727, 'max_score': 465.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc465857.jpg'}, {'end': 572.848, 'src': 'embed', 'start': 547.027, 'weight': 4, 'content': [{'end': 551.551, 'text': 'whereas GPUs actually have their own RAM built into the chip.', 'start': 547.027, 'duration': 4.524}, {'end': 557.054, 'text': "There's a pretty large bottleneck communicating between the RAM in your system and the GPU,", 'start': 552.031, 'duration': 5.023}, {'end': 562.879, 'text': 'so that GPUs typically have their own relatively large block of memory within the card itself.', 'start': 557.054, 'duration': 5.825}, {'end': 572.848, 'text': 'And for the Titan XP, which again is maybe the current top of the line consumer card, this thing has 12 gigabytes of memory local to the GPU.', 'start': 564.02, 'duration': 8.828}], 'summary': 'Gpus have 12 gigabytes of local memory, reducing communication bottleneck with system ram.', 'duration': 25.821, 'max_score': 547.027, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc547027.jpg'}, {'end': 622.546, 'src': 'embed', 'start': 592.646, 'weight': 2, 'content': [{'end': 596.309, 'text': 'And GPUs are maybe more specialized for these highly paralyzable algorithms.', 'start': 592.646, 'duration': 3.663}, {'end': 603.659, 'text': 'So the prototypical algorithm of something that works really, really well and is perfectly suited to a GPU is matrix multiplication.', 'start': 597.136, 'duration': 6.523}, {'end': 609.741, 'text': "So remember, in matrix multiplication on the left we've got a matrix composed of a bunch of rows.", 'start': 604.239, 'duration': 5.502}, {'end': 613.963, 'text': 'we multiply that on the right by another matrix composed of a bunch of columns,', 'start': 609.741, 'duration': 4.222}, {'end': 622.546, 'text': 'and then this produces another final matrix where each element in the output matrix is a dot product between one of the rows and one of the columns of the two input matrices.', 'start': 613.963, 'duration': 8.583}], 'summary': 'Gpus are suited for highly parallelizable algorithms, like matrix multiplication.', 'duration': 29.9, 'max_score': 592.646, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc592646.jpg'}], 'start': 68.181, 'title': 'Google cloud instances and deep learning', 'summary': 'Emphasizes conserving google cloud credits by using gpu instances when necessary, and reminds about the upcoming theoretical midterm exam. it also discusses the use of gpus and cpus in deep learning, highlighting differences in cores, memory, and suitability for parallelizable algorithms, emphasizing the advantage of gpus for matrix multiplication and convolution.', 'chapters': [{'end': 146.876, 'start': 68.181, 'title': 'Google cloud instances reminder and midterm recap', 'summary': 'Emphasizes the importance of conserving google cloud credits by using gpu instances only when necessary, and reminds students about the upcoming theoretical midterm exam which will be closed book and focused on understanding the presented material.', 'duration': 78.695, 'highlights': ["The midterm will be in class on Tuesday 5-9, and will be more theoretical, involving pen and paper working through different kinds of slightly more theoretical questions to check understanding of the material we've covered so far.", 'For assignment two, conserve credits by using GPU instances only for the last notebook, as the first several notebooks only require Python and NumPy without the need for GPUs.', "It's important to stop Google Cloud instances when not working to preserve credits."]}, {'end': 696.094, 'start': 147.577, 'title': 'Deep learning: gpus and cpus', 'summary': 'Discusses the use of gpus and cpus in deep learning, highlighting the differences in cores, memory, and suitability for parallelizable algorithms, emphasizing the advantage of gpus for matrix multiplication and convolution.', 'duration': 548.517, 'highlights': ['GPUs have thousands of cores, making them suitable for parallelizable algorithms, with the Nvidia Titan XP having 3,840 cores, enabling faster computation for massively parallel problems. GPUs have thousands of cores, with the Nvidia Titan XP having 3,840 cores, enabling faster computation for massively parallel problems.', 'CPUs have a few cores, but with hyper-threading technology, they can run up to 20 threads concurrently, making them powerful for independent tasks. CPUs have a few cores, but with hyper-threading technology, they can run up to 20 threads concurrently, making them powerful for independent tasks.', 'GPUs have their own RAM built into the chip, with the Nvidia Titan XP having 12 gigabytes of memory, enabling efficient communication and processing for parallel tasks. GPUs have their own RAM built into the chip, with the Nvidia Titan XP having 12 gigabytes of memory, enabling efficient communication and processing for parallel tasks.', 'GPUs are well suited for matrix multiplication and convolution due to their ability to parallelize computations for faster throughput, especially with large matrices. GPUs are well suited for matrix multiplication and convolution due to their ability to parallelize computations for faster throughput, especially with large matrices.']}], 'duration': 627.913, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc68181.jpg', 'highlights': ['For assignment two, conserve credits by using GPU instances only for the last notebook, as the first several notebooks only require Python and NumPy without the need for GPUs.', "It's important to stop Google Cloud instances when not working to preserve credits.", 'GPUs are well suited for matrix multiplication and convolution due to their ability to parallelize computations for faster throughput, especially with large matrices.', 'The Nvidia Titan XP has 3,840 cores, enabling faster computation for massively parallel problems.', 'GPUs have their own RAM built into the chip, with the Nvidia Titan XP having 12 gigabytes of memory, enabling efficient communication and processing for parallel tasks.', 'CPUs have a few cores, but with hyper-threading technology, they can run up to 20 threads concurrently, making them powerful for independent tasks.']}, {'end': 1158.922, 'segs': [{'end': 773.383, 'src': 'embed', 'start': 732.94, 'weight': 1, 'content': [{'end': 738.605, 'text': "managing the memory hierarchy and making sure you don't have cache misses and branch miss predictions and all that sort of stuff.", 'start': 732.94, 'duration': 5.665}, {'end': 742.406, 'text': "So it's actually really, really hard to write performant CUDA code on your own.", 'start': 739.105, 'duration': 3.301}, {'end': 752.109, 'text': 'So as a result, NVIDIA has released a lot of libraries that implement common computational primitives that are very, very highly optimized for GPUs.', 'start': 742.946, 'duration': 9.163}, {'end': 753.49, 'text': 'So, for example,', 'start': 752.649, 'duration': 0.841}, {'end': 760.772, 'text': 'NVIDIA has a cuBLAST library that implements different kinds of matrix multiplications and different matrix operations that are super optimized,', 'start': 753.49, 'duration': 7.282}, {'end': 765.934, 'text': 'run really well on GPU, get very close to sort of theoretical peak hardware utilization.', 'start': 760.772, 'duration': 5.162}, {'end': 773.383, 'text': 'Similarly, they have a cuDNN library which implements things like convolution, forward and backward passes, batch normalization,', 'start': 766.795, 'duration': 6.588}], 'summary': 'Nvidia offers optimized libraries for gpus, e.g. cublast and cudnn, achieving close to peak hardware utilization.', 'duration': 40.443, 'max_score': 732.94, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc732940.jpg'}, {'end': 845.819, 'src': 'embed', 'start': 818.724, 'weight': 6, 'content': [{'end': 823.508, 'text': 'so it tends to be a lot less performant than the super optimized versions in CUDA.', 'start': 818.724, 'duration': 4.784}, {'end': 830.692, 'text': 'So maybe in the future we might see a bit of a more open standard and we might see this across many different more types of platforms.', 'start': 824.108, 'duration': 6.584}, {'end': 835.034, 'text': 'But at least for now, this, NVIDIA is kind of the main game in town for deep learning.', 'start': 831.052, 'duration': 3.982}, {'end': 840.697, 'text': "So you can check, there's a lot of different resources for learning about how you can do GPU programming yourself.", 'start': 835.734, 'duration': 4.963}, {'end': 845.819, 'text': "It's kind of fun, it's sort of a different paradigm of writing code because it's this massively parallel architecture.", 'start': 841.317, 'duration': 4.502}], 'summary': 'Nvidia dominates deep learning with gpus, but open standards may rise in the future.', 'duration': 27.095, 'max_score': 818.724, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc818724.jpg'}, {'end': 916.899, 'src': 'embed', 'start': 889.996, 'weight': 0, 'content': [{'end': 899.265, 'text': 'then you typically see something like a 65 to 75 times speedup when running the exact same computation on a top of the line GPU,', 'start': 889.996, 'duration': 9.269}, {'end': 908.473, 'text': 'in this case a Pascal Titan X, versus a top of the line well, not quite top of the line CPU, which in this case was an Intel E5 processor.', 'start': 899.265, 'duration': 9.208}, {'end': 916.899, 'text': "Although I'd like to make one sort of caveat here is that you always need to be super careful whenever you're reading any kind of benchmarks about deep learning,", 'start': 909.374, 'duration': 7.525}], 'summary': 'Gpu yields 65-75x speedup compared to intel e5 cpu for deep learning tasks.', 'duration': 26.903, 'max_score': 889.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc889996.jpg'}, {'end': 1004.336, 'src': 'embed', 'start': 977.939, 'weight': 2, 'content': [{'end': 987.204, 'text': 'And you can see that if you compare the same networks on the same hardware with the same deep learning framework and the only difference is swapping out these cuDNN versus sort of handwritten,', 'start': 977.939, 'duration': 9.265}, {'end': 988.505, 'text': 'less optimized CUDA,', 'start': 987.204, 'duration': 1.301}, {'end': 996.871, 'text': 'you can see something like nearly a 3x speedup across the board when you switch from this relatively simple CUDA to these super optimized cuDNN implementations.', 'start': 988.505, 'duration': 8.366}, {'end': 1004.336, 'text': "So, in general, whenever you're writing code on GPU, you should probably almost always just make sure you're using cuDNN,", 'start': 997.471, 'duration': 6.865}], 'summary': 'Using cudnn can lead to nearly 3x speedup in deep learning tasks.', 'duration': 26.397, 'max_score': 977.939, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc977939.jpg'}, {'end': 1070.508, 'src': 'embed', 'start': 1035.727, 'weight': 3, 'content': [{'end': 1041.171, 'text': "it can compute forward and backward quite fast, but if you're reading sequentially off a spinning disk,", 'start': 1035.727, 'duration': 5.444}, {'end': 1044.673, 'text': 'you can actually bottleneck your training quite, and that can be really bad and slow you down.', 'start': 1041.171, 'duration': 3.502}, {'end': 1051.958, 'text': "So some solutions here are that if your data set's really small, sometimes you might just read the whole data set into RAM.", 'start': 1045.733, 'duration': 6.225}, {'end': 1056.181, 'text': "or even if your data set isn't so small but you have a giant server with a ton of RAM, you might do that anyway.", 'start': 1051.958, 'duration': 4.223}, {'end': 1062.225, 'text': "You can also make sure you're using an SSD instead of a hard drive, that can help a lot with read throughput.", 'start': 1056.941, 'duration': 5.284}, {'end': 1070.508, 'text': 'Another common strategy is to use multiple threads on the CPU that are prefetching data off RAM or off disk,', 'start': 1063.205, 'duration': 7.303}], 'summary': 'Reading sequentially from a spinning disk can bottleneck training, but solutions such as using ssds and multiple cpu threads can improve read throughput.', 'duration': 34.781, 'max_score': 1035.727, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1035727.jpg'}], 'start': 696.474, 'title': 'Gpu acceleration in deep learning', 'summary': "Discusses the speed advantage of gpus over cpus in deep learning, with nvidia's cublast and cudnn libraries offering highly optimized computational primitives for gpus, resulting in a 65 to 75 times speedup. it also compares cpu and gpu performance, highlighting a nearly 3x speedup with cudnn libraries and strategies to mitigate potential bottlenecks in training due to disk i/o.", 'chapters': [{'end': 925.905, 'start': 696.474, 'title': 'Gpu speed advantage in deep learning', 'summary': "Discusses the speed advantage of gpus over cpus in deep learning, with nvidia's cublast and cudnn libraries offering highly optimized computational primitives for gpus, resulting in a 65 to 75 times speedup for certain computations compared to top-of-the-line cpus.", 'duration': 229.431, 'highlights': ["NVIDIA's cuBLAST and cuDNN libraries NVIDIA provides highly optimized libraries, such as cuBLAST and cuDNN, implementing computational primitives for GPUs, resulting in a 65 to 75 times speedup for certain computations compared to top-of-the-line CPUs.", 'GPU Speed Advantage over CPUs GPUs offer a huge speed advantage over CPUs in deep learning, with a 65 to 75 times speedup for certain computations compared to top-of-the-line CPUs.', 'Challenges of Writing CUDA Code Writing performant CUDA code is challenging, requiring careful management of memory hierarchy, cache misses, and branch miss predictions.', 'OpenCL in Deep Learning OpenCL, while more general, is less performant than CUDA for deep learning due to the lack of optimized deep learning primitives.']}, {'end': 1158.922, 'start': 926.545, 'title': 'Gpu vs cpu performance', 'summary': 'Discusses the performance comparison between cpu and gpu for deep learning, highlighting nearly a 3x speedup when using cudnn libraries, potential bottlenecks in training due to disk i/o, and strategies to mitigate these bottlenecks.', 'duration': 232.377, 'highlights': ['Nearly a 3x speedup is observed when using cuDNN libraries compared to handwritten, less optimized CUDA implementations. Comparing networks on the same hardware with the same deep learning framework shows nearly a 3x speedup when switching from handwritten, less optimized CUDA to cuDNN implementations.', 'Potential bottlenecks in training due to disk I/O can significantly slow down the process. Sequential reading of data off a spinning disk can bottleneck training process, causing significant slowdowns.', "Strategies to mitigate bottlenecks include prefetching data on the CPU, using SSD instead of a hard drive, and reading the entire dataset into RAM if it's small. Strategies to mitigate bottlenecks involve prefetching data on the CPU, using SSD instead of a hard drive, and reading the entire dataset into RAM if it's small."]}], 'duration': 462.448, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc696474.jpg', 'highlights': ['GPUs offer a huge speed advantage over CPUs in deep learning, with a 65 to 75 times speedup for certain computations compared to top-of-the-line CPUs.', 'NVIDIA provides highly optimized libraries, such as cuBLAST and cuDNN, implementing computational primitives for GPUs, resulting in a 65 to 75 times speedup for certain computations compared to top-of-the-line CPUs.', 'Nearly a 3x speedup is observed when using cuDNN libraries compared to handwritten, less optimized CUDA implementations.', "Strategies to mitigate bottlenecks involve prefetching data on the CPU, using SSD instead of a hard drive, and reading the entire dataset into RAM if it's small.", 'Writing performant CUDA code is challenging, requiring careful management of memory hierarchy, cache misses, and branch miss predictions.', 'Sequential reading of data off a spinning disk can bottleneck training process, causing significant slowdowns.', 'OpenCL, while more general, is less performant than CUDA for deep learning due to the lack of optimized deep learning primitives.']}, {'end': 2039.721, 'segs': [{'end': 1232.935, 'src': 'embed', 'start': 1193.431, 'weight': 0, 'content': [{'end': 1197.454, 'text': 'It had not seen super widespread adoption yet at that time.', 'start': 1193.431, 'duration': 4.023}, {'end': 1201.838, 'text': 'But now I think in the last year, TensorFlow has gotten much more popular.', 'start': 1198.195, 'duration': 3.643}, {'end': 1204.66, 'text': "It's probably the main framework of choice for many people.", 'start': 1201.918, 'duration': 2.742}, {'end': 1207.122, 'text': "So that's a big change.", 'start': 1205.14, 'duration': 1.982}, {'end': 1211.964, 'text': "We've also seen a ton of new frameworks sort of popping up like mushrooms in the last year.", 'start': 1207.962, 'duration': 4.002}, {'end': 1217.947, 'text': 'So in particular, Cafe2 and PyTorch are new frameworks from Facebook that I think are pretty interesting.', 'start': 1212.905, 'duration': 5.042}, {'end': 1225.271, 'text': "There's also a ton of other frameworks, like Baidu has Paddle, Microsoft has CNTK,", 'start': 1218.628, 'duration': 6.643}, {'end': 1232.935, 'text': "Amazon is mostly using MXNet and there's a ton of other frameworks as well that I'm less familiar with and really don't have time to get into.", 'start': 1225.271, 'duration': 7.664}], 'summary': 'Tensorflow has seen widespread adoption with a surge in popularity, now being the main framework of choice for many, alongside the emergence of other new frameworks like cafe2 and pytorch from facebook.', 'duration': 39.504, 'max_score': 1193.431, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1193431.jpg'}, {'end': 1284.543, 'src': 'embed', 'start': 1252.723, 'weight': 2, 'content': [{'end': 1256.305, 'text': 'But these kind of next generation deep learning frameworks all originated in industry.', 'start': 1252.723, 'duration': 3.582}, {'end': 1260.47, 'text': 'So Cafe Two is from Facebook, Pie Torch is from Facebook, TensorFlow is from Google.', 'start': 1256.745, 'duration': 3.725}, {'end': 1268.559, 'text': "So that's kind of an interesting shift that we've seen in the landscape over the last couple of years is that these ideas have really moved a lot from academia into industry.", 'start': 1260.49, 'duration': 8.069}, {'end': 1272.243, 'text': 'And now industry is kind of giving us these big, powerful, nice frameworks to work with.', 'start': 1268.879, 'duration': 3.364}, {'end': 1278.018, 'text': 'So today I wanted to mostly talk about PyTorch and TensorFlow,', 'start': 1274.015, 'duration': 4.003}, {'end': 1284.543, 'text': 'because I personally think that those are probably the ones you should be focusing on for a lot of research-type problems these days.', 'start': 1278.018, 'duration': 6.525}], 'summary': 'Next-gen deep learning frameworks: cafe two from facebook, pie torch from facebook, tensorflow from google. shift from academia to industry. focus on pytorch and tensorflow for research problems.', 'duration': 31.82, 'max_score': 1252.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1252723.jpg'}, {'end': 1396.361, 'src': 'embed', 'start': 1371.897, 'weight': 3, 'content': [{'end': 1378.404, 'text': "there's really kind of three main reasons why you might want to use one of these deep learning frameworks rather than just writing your own code.", 'start': 1371.897, 'duration': 6.507}, {'end': 1388.094, 'text': 'So the first would be that these frameworks enable you to easily build and work with these big hairy computational graphs without kind of worrying about a lot of those bookkeeping details yourself.', 'start': 1379.025, 'duration': 9.069}, {'end': 1394.8, 'text': "Another major idea is that whenever we're working in deep learning, we always need to compute gradients.", 'start': 1389.095, 'duration': 5.705}, {'end': 1396.361, 'text': "We're always computing some loss.", 'start': 1395.1, 'duration': 1.261}], 'summary': 'Deep learning frameworks simplify graph building, gradient computation, and loss computation.', 'duration': 24.464, 'max_score': 1371.897, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1371897.jpg'}, {'end': 1832.788, 'src': 'embed', 'start': 1806.119, 'weight': 4, 'content': [{'end': 1813.886, 'text': "We're just building up this computational graph data structure telling TensorFlow which operations we want to eventually run once we put in real data.", 'start': 1806.119, 'duration': 7.767}, {'end': 1817.398, 'text': 'So this is just building the graph, this is not actually doing anything.', 'start': 1815.157, 'duration': 2.241}, {'end': 1825.323, 'text': "Then we have this magical line where, after we've computed our loss with these symbolic operations,", 'start': 1818.819, 'duration': 6.504}, {'end': 1832.788, 'text': 'then we can just ask TensorFlow to compute the gradient of the loss with respect to w1 and w2 in this one magical, beautiful line.', 'start': 1825.323, 'duration': 7.465}], 'summary': 'In tensorflow, building computational graph to compute gradients in one line.', 'duration': 26.669, 'max_score': 1806.119, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1806119.jpg'}, {'end': 1926.836, 'src': 'heatmap', 'start': 1873.999, 'weight': 0.713, 'content': [{'end': 1878.863, 'text': 'So TensorFlow just expects to receive data from NumPy arrays in most cases.', 'start': 1873.999, 'duration': 4.864}, {'end': 1889.211, 'text': "So here we're just creating concrete actual values for x, y, w1 and w2 using NumPy and then storing these in some dictionary.", 'start': 1879.403, 'duration': 9.808}, {'end': 1892.893, 'text': "And now here is where we're actually running the graph.", 'start': 1890.711, 'duration': 2.182}, {'end': 1898.217, 'text': "So you can see that we're calling session.run to actually execute some part of the graph.", 'start': 1893.333, 'duration': 4.884}, {'end': 1903.421, 'text': 'The first argument loss tells us which part of the graph do we actually want as output.', 'start': 1898.838, 'duration': 4.583}, {'end': 1910.907, 'text': 'So we actually want the graph, in this case we need to tell it that we actually want to compute loss and grad one and grad w two.', 'start': 1904.842, 'duration': 6.065}, {'end': 1916.152, 'text': 'And we need to pass in with this feed dict parameter the actual concrete values that will be fed to the graph.', 'start': 1911.348, 'duration': 4.804}, {'end': 1926.836, 'text': "And then after, in this one line, it's going and running the graph and then computing those values for loss grad one, grad w two,", 'start': 1917.112, 'duration': 9.724}], 'summary': 'Using tensorflow with numpy arrays to execute graph and compute values.', 'duration': 52.837, 'max_score': 1873.999, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1873999.jpg'}, {'end': 2051.71, 'src': 'embed', 'start': 2020.576, 'weight': 5, 'content': [{'end': 2026.941, 'text': "we talked about GPU bottleneck and how it's very expensive actually to copy data between CPU memory and GPU memory.", 'start': 2020.576, 'duration': 6.365}, {'end': 2031.307, 'text': 'So if your network was very large and your weights and gradients were very big,', 'start': 2027.521, 'duration': 3.786}, {'end': 2034.512, 'text': 'then doing something like this would be super expensive and super slow,', 'start': 2031.307, 'duration': 3.205}, {'end': 2038.739, 'text': "because we'd be copying all kinds of data back and forth between the CPU and the GPU at every time step.", 'start': 2034.512, 'duration': 4.227}, {'end': 2039.721, 'text': "So that's bad.", 'start': 2039.18, 'duration': 0.541}, {'end': 2040.482, 'text': "We don't want to do that.", 'start': 2039.781, 'duration': 0.701}, {'end': 2041.243, 'text': 'We need to fix that.', 'start': 2040.542, 'duration': 0.701}, {'end': 2051.71, 'text': 'So obviously TensorFlow has some solution to this, and the idea is that now we want our weights w1 and w2, rather than being placeholders,', 'start': 2042.583, 'duration': 9.127}], 'summary': 'Copying data between cpu and gpu is expensive and slow. tensorflow has a solution for large networks and big weights.', 'duration': 31.134, 'max_score': 2020.576, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2020576.jpg'}], 'start': 1158.922, 'title': 'Deep learning frameworks', 'summary': 'Discusses the landscape of deep learning frameworks, focusing on the shift from academia to industry, the rise in popularity of tensorflow, and the emergence of new frameworks like cafe2 and pytorch. it also covers the importance of computational graphs in deep learning, reasons for using deep learning frameworks, and the process of building and running a computational graph in tensorflow, highlighting automatic gradient computation and gpu acceleration.', 'chapters': [{'end': 1312.415, 'start': 1158.922, 'title': 'Deep learning frameworks landscape', 'summary': 'Discusses the landscape of deep learning frameworks, focusing on the shift from academia to industry, the rise in popularity of tensorflow, and the emergence of new frameworks like cafe2 and pytorch.', 'duration': 153.493, 'highlights': ['The rise in popularity of TensorFlow, becoming the main framework of choice for many people in the last year TensorFlow has seen much more popular adoption, becoming the main framework of choice for many people.', 'The emergence of new frameworks like Cafe2 and PyTorch, developed by Facebook, and other frameworks like Paddle, CNTK, and MXNet New frameworks like Cafe2 and PyTorch from Facebook, as well as other frameworks like Paddle, CNTK, and MXNet, have emerged in the landscape.', 'The shift in deep learning frameworks from academia to industry, with the next generation frameworks originating in industry The next generation of deep learning frameworks have shifted from academia to industry, with frameworks like Cafe Two from Facebook, Pie Torch from Facebook, and TensorFlow from Google.']}, {'end': 2039.721, 'start': 1312.415, 'title': 'Deep learning frameworks and computational graphs', 'summary': 'Discusses the importance of computational graphs in deep learning, the reasons for using deep learning frameworks, and the process of building and running a computational graph in tensorflow, highlighting the automatic gradient computation and gpu acceleration.', 'duration': 727.306, 'highlights': ['Deep learning frameworks enable building and working with complex computational graphs, automatically computing gradients, and efficiently running on GPUs. Deep learning frameworks provide ease in building and working with complex computational graphs, automatically computing gradients, and efficient GPU utilization.', 'The process of building and running a computational graph in TensorFlow involves defining symbolic variables, performing operations, and using TensorFlow sessions to execute the graph with concrete values for computation. In TensorFlow, the process of building and running a computational graph involves defining symbolic variables, performing operations, using TensorFlow sessions to execute the graph with concrete values for computation.', 'The inefficiency of copying data between NumPy arrays and TensorFlow, especially when working with large networks and big weights and gradients, can result in significant performance issues, particularly in GPU computation. Copying data between NumPy arrays and TensorFlow can be inefficient, leading to performance issues in GPU computation, especially with large networks and big weights and gradients.']}], 'duration': 880.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc1158922.jpg', 'highlights': ['The rise in popularity of TensorFlow, becoming the main framework of choice for many people in the last year', 'The emergence of new frameworks like Cafe2 and PyTorch, developed by Facebook, and other frameworks like Paddle, CNTK, and MXNet', 'The shift in deep learning frameworks from academia to industry, with the next generation frameworks originating in industry', 'Deep learning frameworks enable building and working with complex computational graphs, automatically computing gradients, and efficiently running on GPUs', 'The process of building and running a computational graph in TensorFlow involves defining symbolic variables, performing operations, and using TensorFlow sessions to execute the graph with concrete values for computation', 'The inefficiency of copying data between NumPy arrays and TensorFlow, especially when working with large networks and big weights and gradients, can result in significant performance issues, particularly in GPU computation']}, {'end': 2822.638, 'segs': [{'end': 2153.592, 'src': 'embed', 'start': 2129.728, 'weight': 0, 'content': [{'end': 2136.778, 'text': 'So now we use this assign function, which mutates these variables inside the computational graph.', 'start': 2129.728, 'duration': 7.05}, {'end': 2140.283, 'text': 'And now the mutated value will persist across multiple runs of the same graph.', 'start': 2137.098, 'duration': 3.185}, {'end': 2151.07, 'text': 'So now, when we run this graph and when we train the network now, we need to run the graph once, with a bit of special incantation to tell TensorFlow,', 'start': 2142.242, 'duration': 8.828}, {'end': 2153.592, 'text': 'to set up these variables that are gonna live inside the graph.', 'start': 2151.07, 'duration': 2.522}], 'summary': 'Using assign function mutates variables inside the computational graph, persisting across multiple runs.', 'duration': 23.864, 'max_score': 2129.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2129728.jpg'}, {'end': 2548.52, 'src': 'embed', 'start': 2510.457, 'weight': 1, 'content': [{'end': 2512.578, 'text': 'We have to use this funny tf.group thing.', 'start': 2510.457, 'duration': 2.121}, {'end': 2513.638, 'text': "That's kind of a pain.", 'start': 2512.878, 'duration': 0.76}, {'end': 2519.579, 'text': 'So thankfully, TensorFlow gives you some convenience operations that kind of do that kind of stuff for you.', 'start': 2514.078, 'duration': 5.501}, {'end': 2521.38, 'text': "And that's called an optimizer.", 'start': 2520.42, 'duration': 0.96}, {'end': 2528.325, 'text': "So here, we're using tf.train.gradientdescentoptimizer, and we're telling it what learning rate we want to use.", 'start': 2521.94, 'duration': 6.385}, {'end': 2532.768, 'text': "And you can imagine that there's atom, there's rmsprop, there's all kinds of different optimization algorithms here.", 'start': 2528.665, 'duration': 4.103}, {'end': 2536.05, 'text': 'And now we call optimizer.minimize of loss.', 'start': 2533.408, 'duration': 2.642}, {'end': 2548.52, 'text': 'And now this is a pretty magical thing, because now this call is aware that these variables w1 and w2 are marked as trainable by default.', 'start': 2537.731, 'duration': 10.789}], 'summary': "Using tensorflow's optimizer, we're employing tf.train.gradientdescentoptimizer with specified learning rate to minimize loss and mark variables w1 and w2 as trainable.", 'duration': 38.063, 'max_score': 2510.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2510457.jpg'}, {'end': 2749.077, 'src': 'embed', 'start': 2718.542, 'weight': 2, 'content': [{'end': 2722.503, 'text': 'And it could be really annoying to like make sure you initialize the weights with the right shapes and all that sort of stuff.', 'start': 2718.542, 'duration': 3.961}, {'end': 2729.846, 'text': "So as a result, there's a bunch of sort of higher level libraries that wrap around TensorFlow and handle some of these details for you.", 'start': 2723.123, 'duration': 6.723}, {'end': 2736.029, 'text': 'So one example that ships with TensorFlow is this tf.layers inside.', 'start': 2730.766, 'duration': 5.263}, {'end': 2741.873, 'text': 'So now, in this code example, you can see that our code is only explicitly declaring the x and the y,', 'start': 2736.39, 'duration': 5.483}, {'end': 2743.854, 'text': 'which are the placeholders for the data and the labels.', 'start': 2741.873, 'duration': 1.981}, {'end': 2749.077, 'text': 'And now we say that h equals tf.layers.dense.', 'start': 2744.455, 'duration': 4.622}], 'summary': 'Higher level libraries handle tensorflow details, like tf.layers inside.', 'duration': 30.535, 'max_score': 2718.542, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2718542.jpg'}], 'start': 2039.781, 'title': 'Tensorflow variable initialization, update, and optimization operations', 'summary': 'Covers the use of variables, initialization, and update operations in tensorflow, along with optimization techniques, convenience functions, and higher-level libraries. it emphasizes the need for proper training, addressing potential issues, and strategies leading to faster convergence.', 'chapters': [{'end': 2510.437, 'start': 2039.781, 'title': 'Tensorflow variable initialization and update operations', 'summary': 'Discusses the use of variables instead of placeholders in tensorflow, the need for initializing and updating these variables within the computational graph, and the method to explicitly run update operations to ensure proper training, addressing potential issues and solutions along the way.', 'duration': 470.656, 'highlights': ['Variables in TensorFlow are used to persist values inside the computational graph and need to be explicitly initialized and updated within the graph Variables are defined instead of placeholders for weights w1 and w2 to persist inside the computational graph, and TensorFlow is responsible for initializing and updating them.', 'The use of the assign function in TensorFlow mutates variables inside the computational graph and ensures the mutated value persists across multiple runs of the same graph The assign function is used to mutate variables inside the computational graph, allowing the mutated value to persist across multiple runs of the same graph.', 'Explicitly running update operations in TensorFlow is necessary to ensure that the new values of variables are computed, addressing potential issues and solutions in the process Explicitly telling TensorFlow to perform update operations is necessary to ensure the computation of new values for variables, addressing potential issues and solutions in the process.']}, {'end': 2822.638, 'start': 2510.457, 'title': 'Tensorflow optimization and convenience operations', 'summary': "Discusses the usage of tensorflow's optimizer, tf.globalvariables initializer, usage of convenience functions like tf.losses.means squared error, and the usage of higher-level libraries like tf.layers.dense for handling architectural details. it also covers the benefits of using these operations and strategies, leading to faster convergence.", 'duration': 312.181, 'highlights': ['The usage of tf.train.gradientdescentoptimizer and its ability to perform gradient computation and update operations internally, providing convenience and automation in the optimization process.', 'The explanation of tf.globalvariables initializer, its role in initializing variables like w1 and w2, and the usage of tf.randomnormal for generating concrete values, showcasing the internal workings of variable initialization within the graph.', "The demonstration of using tf.losses.means squared error as a convenient function for computing L2 loss, reducing the need for explicit computation using basic tensor operations, highlighting the convenience and efficiency provided by TensorFlow's built-in functions.", 'The introduction of tf.layers.dense as a higher-level library for handling architectural details, such as setting up variables for weights and biases, utilizing Xavier initializer for initialization, and performing activation functions like ReLU, streamlining the process of building neural networks and improving convergence speed.']}], 'duration': 782.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2039781.jpg', 'highlights': ['Variables in TensorFlow are used to persist values inside the computational graph and need to be explicitly initialized and updated within the graph', 'The usage of tf.train.gradientdescentoptimizer and its ability to perform gradient computation and update operations internally, providing convenience and automation in the optimization process', 'The introduction of tf.layers.dense as a higher-level library for handling architectural details, such as setting up variables for weights and biases, utilizing Xavier initializer for initialization, and performing activation functions like ReLU, streamlining the process of building neural networks and improving convergence speed', 'The use of the assign function in TensorFlow mutates variables inside the computational graph and ensures the mutated value persists across multiple runs of the same graph']}, {'end': 3518.155, 'segs': [{'end': 2854.279, 'src': 'embed', 'start': 2824.479, 'weight': 1, 'content': [{'end': 2827.081, 'text': "And you can see that we're using two calls to tf.layers,", 'start': 2824.479, 'duration': 2.602}, {'end': 2831.243, 'text': 'and this lets us build our model without doing all these explicit bookkeeping details ourself.', 'start': 2827.081, 'duration': 4.162}, {'end': 2833.344, 'text': 'So this is maybe a little bit more convenient.', 'start': 2831.743, 'duration': 1.601}, {'end': 2838.647, 'text': 'But tf.contrib.layers is really not the only game in town.', 'start': 2834.705, 'duration': 3.942}, {'end': 2843.41, 'text': "There's a lot of different higher level libraries that people build on top of TensorFlow.", 'start': 2839.027, 'duration': 4.383}, {'end': 2846.972, 'text': "And it's kind of due to this basic impotence mismatch.", 'start': 2843.91, 'duration': 3.062}, {'end': 2854.279, 'text': "the computational graph is a relatively low level thing, but when we're working with neural networks, we have this concept of layers and weights,", 'start': 2847.652, 'duration': 6.627}], 'summary': 'Using tf.layers for model building, but other high-level libraries available for tensorflow.', 'duration': 29.8, 'max_score': 2824.479, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2824479.jpg'}, {'end': 2887.266, 'src': 'embed', 'start': 2862.187, 'weight': 0, 'content': [{'end': 2867.852, 'text': "So that's what these various packages are trying to help you out and let you work at this higher layer of abstraction.", 'start': 2862.187, 'duration': 5.665}, {'end': 2872.221, 'text': 'So another very popular package that you may have seen before is Keras.', 'start': 2868.88, 'duration': 3.341}, {'end': 2874.942, 'text': 'Keras is a very beautiful,', 'start': 2872.921, 'duration': 2.021}, {'end': 2881.324, 'text': 'nice API that sits on top of TensorFlow and handles sort of building up these computational graphs for you up in the backend.', 'start': 2874.942, 'duration': 6.382}, {'end': 2887.266, 'text': "So, by the way, Keras also supports Theano as a backend, so that's also kind of nice.", 'start': 2882.025, 'duration': 5.241}], 'summary': 'Various packages, including keras, enable working at a higher level of abstraction on top of tensorflow and theano.', 'duration': 25.079, 'max_score': 2862.187, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2862187.jpg'}, {'end': 2994.556, 'src': 'embed', 'start': 2967.75, 'weight': 3, 'content': [{'end': 2973.154, 'text': "but there's these three different ones tf.layers, tf.slim and tf.contrib.learn,", 'start': 2967.75, 'duration': 5.404}, {'end': 2978.517, 'text': 'that all ship with TensorFlow that are all kind of doing a slightly different version of this higher level wrapper thing.', 'start': 2973.154, 'duration': 5.363}, {'end': 2985.33, 'text': "There's another framework also from Google but not shipping with TensorFlow called PrettyTensor that does the same sort of thing.", 'start': 2980.026, 'duration': 5.304}, {'end': 2988.712, 'text': 'And I guess none of these were good enough for DeepMind,', 'start': 2986.591, 'duration': 2.121}, {'end': 2994.556, 'text': 'because they went ahead a couple weeks ago and wrote and released their very own high-level TensorFlow wrapper called Sonnet.', 'start': 2988.712, 'duration': 5.844}], 'summary': 'Google offers multiple high-level tensorflow wrappers, but deepmind developed its own called sonnet.', 'duration': 26.806, 'max_score': 2967.75, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2967750.jpg'}, {'end': 3043.562, 'src': 'embed', 'start': 3009.406, 'weight': 4, 'content': [{'end': 3015.308, 'text': "There's some examples in TFSlim and in Keras, because remember, pre-trained models are super important when you're training your own things.", 'start': 3009.406, 'duration': 5.902}, {'end': 3020.929, 'text': "There's also this idea of TensorBoard, where you can load up your I don't want to get into the details,", 'start': 3016.268, 'duration': 4.661}, {'end': 3026.071, 'text': 'but TensorBoard you can add sort of instrumentation to your code and then plot losses and things as you go through the training process.', 'start': 3020.929, 'duration': 5.142}, {'end': 3032.53, 'text': 'TensorFlow also lets you run distributed, where you can break up a computational graph, run on different machines.', 'start': 3027.965, 'duration': 4.565}, {'end': 3039.177, 'text': "That's super cool, but I think probably not really anyone outside of Google is really using that to great success these days.", 'start': 3032.83, 'duration': 6.347}, {'end': 3043.562, 'text': 'But if you do wanna run distributed stuff, probably TensorFlow is the main game in town for that.', 'start': 3039.638, 'duration': 3.924}], 'summary': 'Pre-trained models are important; tensorboard for monitoring; tensorflow for distributed computing.', 'duration': 34.156, 'max_score': 3009.406, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3009406.jpg'}, {'end': 3102.635, 'src': 'embed', 'start': 3076.938, 'weight': 6, 'content': [{'end': 3085.725, 'text': 'So PyTorch from Facebook is kind of different from TensorFlow in that we have sort of three explicit different layers of abstraction inside PyTorch.', 'start': 3076.938, 'duration': 8.787}, {'end': 3091.888, 'text': 'So PyTorch has this tensor object, which is just like a NumPy array.', 'start': 3086.445, 'duration': 5.443}, {'end': 3093.609, 'text': "It's just an imperative array.", 'start': 3092.509, 'duration': 1.1}, {'end': 3096.391, 'text': "It doesn't know anything about deep learning, but it can run the GPU.", 'start': 3093.629, 'duration': 2.762}, {'end': 3102.635, 'text': 'We have this variable object, which is a node in a computational graph which builds up computational graphs.', 'start': 3096.851, 'duration': 5.784}], 'summary': 'Pytorch from facebook has three layers of abstraction: tensor object, variable object, and gpu compatibility.', 'duration': 25.697, 'max_score': 3076.938, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3076938.jpg'}, {'end': 3148.288, 'src': 'embed', 'start': 3115.683, 'weight': 8, 'content': [{'end': 3121.206, 'text': 'you can think of the PyTorch tensor as fulfilling the same role as the NumPy array in TensorFlow.', 'start': 3115.683, 'duration': 5.523}, {'end': 3128.971, 'text': 'The PyTorch variable is similar to the TensorFlow tensor or variable or placeholder, which are all sort of nodes in a computational graph.', 'start': 3121.927, 'duration': 7.044}, {'end': 3138.217, 'text': 'And now the PyTorch module is kind of equivalent to these higher level things from tf.slim or tf.layers or Sonnet or these other higher level frameworks.', 'start': 3129.431, 'duration': 8.786}, {'end': 3139.678, 'text': 'So right away.', 'start': 3138.797, 'duration': 0.881}, {'end': 3148.288, 'text': 'one thing to notice about PyTorch is that because it ships with this higher level abstraction and one really nice higher level abstraction called modules on its own,', 'start': 3139.678, 'duration': 8.61}], 'summary': 'Pytorch provides higher level abstractions like modules for computational graph.', 'duration': 32.605, 'max_score': 3115.683, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3115683.jpg'}, {'end': 3228.781, 'src': 'embed', 'start': 3202.936, 'weight': 7, 'content': [{'end': 3207.937, 'text': 'But the major difference between the PyTorch tensor and NumPy arrays is that they run on GPU.', 'start': 3202.936, 'duration': 5.001}, {'end': 3212.638, 'text': 'So all you have to do to make this code run on GPU is use a different data type.', 'start': 3208.437, 'duration': 4.201}, {'end': 3218.339, 'text': 'Rather than using torch.floatTensor, you do torch.cuda.floatTensor,', 'start': 3213.118, 'duration': 5.221}, {'end': 3223.06, 'text': 'cast all of your tensors to this new data type and everything runs magically on the GPU.', 'start': 3218.339, 'duration': 4.721}, {'end': 3227.62, 'text': 'So you should think of PyTorch tensors as just NumPy plus GPU.', 'start': 3223.64, 'duration': 3.98}, {'end': 3228.781, 'text': "That's exactly what it is.", 'start': 3227.94, 'duration': 0.841}], 'summary': 'Pytorch tensors run on gpu, using torch.cuda.floattensor for gpu computation.', 'duration': 25.845, 'max_score': 3202.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3202936.jpg'}, {'end': 3364.657, 'src': 'embed', 'start': 3327.418, 'weight': 10, 'content': [{'end': 3328.419, 'text': 'Here in PyTorch.', 'start': 3327.418, 'duration': 1.001}, {'end': 3332.281, 'text': "instead, we're building up a new graph every time we do a forward pass,", 'start': 3328.419, 'duration': 3.862}, {'end': 3336.083, 'text': "and this makes the code look a bit cleaner and it has some other implications that we'll get to in a bit.", 'start': 3332.281, 'duration': 3.802}, {'end': 3343.488, 'text': 'So in PyTorch, you can define your own new autograd functions by defining the forward and backward in terms of tensors.', 'start': 3337.264, 'duration': 6.224}, {'end': 3348.17, 'text': 'This ends up looking kind of like the module layers code that you write for homework.', 'start': 3344.188, 'duration': 3.982}, {'end': 3353.894, 'text': 'two, where you can implement forward and backward using tensor operations and then stick these things inside computational graphs.', 'start': 3348.17, 'duration': 5.724}, {'end': 3364.657, 'text': "So here we're defining our own ReLU and then we can actually go and use our own ReLU operation and now stick it inside our computational graph and define our own operations this way.", 'start': 3354.474, 'duration': 10.183}], 'summary': 'Pytorch allows defining custom autograd functions for computational graphs.', 'duration': 37.239, 'max_score': 3327.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3327418.jpg'}, {'end': 3396.287, 'src': 'heatmap', 'start': 3181.399, 'weight': 9, 'content': [{'end': 3189.766, 'text': "So you set up some random data, you use some operations to compute the forward pass, and then we're explicitly doing the backward pass ourself.", 'start': 3181.399, 'duration': 8.367}, {'end': 3195.211, 'text': 'Just sort of back propping through the network, through the operations, just as you did on homework one.', 'start': 3190.607, 'duration': 4.604}, {'end': 3200.895, 'text': "And now we're doing a manual update of the weights using our learning rate and using our computed gradients.", 'start': 3195.871, 'duration': 5.024}, {'end': 3207.937, 'text': 'But the major difference between the PyTorch tensor and NumPy arrays is that they run on GPU.', 'start': 3202.936, 'duration': 5.001}, {'end': 3212.638, 'text': 'So all you have to do to make this code run on GPU is use a different data type.', 'start': 3208.437, 'duration': 4.201}, {'end': 3218.339, 'text': 'Rather than using torch.floatTensor, you do torch.cuda.floatTensor,', 'start': 3213.118, 'duration': 5.221}, {'end': 3223.06, 'text': 'cast all of your tensors to this new data type and everything runs magically on the GPU.', 'start': 3218.339, 'duration': 4.721}, {'end': 3227.62, 'text': 'So you should think of PyTorch tensors as just NumPy plus GPU.', 'start': 3223.64, 'duration': 3.98}, {'end': 3228.781, 'text': "That's exactly what it is.", 'start': 3227.94, 'duration': 0.841}, {'end': 3230.841, 'text': 'Nothing specific to deep learning.', 'start': 3229.161, 'duration': 1.68}, {'end': 3235.314, 'text': 'So the next layer of abstraction in PyTorch is the variable.', 'start': 3232.512, 'duration': 2.802}, {'end': 3238.897, 'text': 'So this is once we move from tensors to variables.', 'start': 3235.794, 'duration': 3.103}, {'end': 3243.2, 'text': "now we're building computational graphs and we're able to take gradients automatically and everything like that.", 'start': 3238.897, 'duration': 4.303}, {'end': 3254.448, 'text': 'So here if x is a variable, then x.data is a tensor and x.grad is another variable containing the gradients of the loss with respect to that tensor.', 'start': 3244.22, 'duration': 10.228}, {'end': 3257.55, 'text': 'so x.grad.data is an actual tensor containing those gradients.', 'start': 3254.448, 'duration': 3.102}, {'end': 3261.957, 'text': 'And PyTorch tensors and variables have the exact same API.', 'start': 3259.196, 'duration': 2.761}, {'end': 3265.279, 'text': 'So any code that worked on PyTorch tensors.', 'start': 3262.358, 'duration': 2.921}, {'end': 3268.881, 'text': 'you can just make them variables instead and run the same code,', 'start': 3265.279, 'duration': 3.602}, {'end': 3273.383, 'text': "except now you're building up a computational graph rather than just doing these imperative operations.", 'start': 3268.881, 'duration': 4.502}, {'end': 3278.825, 'text': 'So here, when we create these variables,', 'start': 3274.683, 'duration': 4.142}, {'end': 3286.369, 'text': 'each call to the variable constructor wraps a PyTorch tensor and then also gives a flag whether or not we want to compute gradients with respect to this variable.', 'start': 3278.825, 'duration': 7.544}, {'end': 3293.842, 'text': 'And now in the forward pass, it looks exactly like it did before in the case with tensors because they have the same API.', 'start': 3287.438, 'duration': 6.404}, {'end': 3299.145, 'text': "So now we're computing our predictions, we're computing our loss in kind of this imperative kind of way.", 'start': 3294.502, 'duration': 4.643}, {'end': 3304.489, 'text': 'And then we call loss.backwards and now all these gradients come out for us.', 'start': 3300.286, 'duration': 4.203}, {'end': 3311.492, 'text': 'and then we can make a gradient update step on our weights using the gradients that are now present in the w1.grad.data.', 'start': 3305.089, 'duration': 6.403}, {'end': 3317.694, 'text': 'So this ends up looking quite like the NumPy case except all the gradients come for free.', 'start': 3311.992, 'duration': 5.702}, {'end': 3325.537, 'text': "One thing to note that's kind of different between PyTorch and TensorFlow is that in the TensorFlow case we were building up this explicit graph,", 'start': 3318.895, 'duration': 6.642}, {'end': 3326.838, 'text': 'then running the graph many times.', 'start': 3325.537, 'duration': 1.301}, {'end': 3328.419, 'text': 'Here in PyTorch.', 'start': 3327.418, 'duration': 1.001}, {'end': 3332.281, 'text': "instead, we're building up a new graph every time we do a forward pass,", 'start': 3328.419, 'duration': 3.862}, {'end': 3336.083, 'text': "and this makes the code look a bit cleaner and it has some other implications that we'll get to in a bit.", 'start': 3332.281, 'duration': 3.802}, {'end': 3343.488, 'text': 'So in PyTorch, you can define your own new autograd functions by defining the forward and backward in terms of tensors.', 'start': 3337.264, 'duration': 6.224}, {'end': 3348.17, 'text': 'This ends up looking kind of like the module layers code that you write for homework.', 'start': 3344.188, 'duration': 3.982}, {'end': 3353.894, 'text': 'two, where you can implement forward and backward using tensor operations and then stick these things inside computational graphs.', 'start': 3348.17, 'duration': 5.724}, {'end': 3364.657, 'text': "So here we're defining our own ReLU and then we can actually go and use our own ReLU operation and now stick it inside our computational graph and define our own operations this way.", 'start': 3354.474, 'duration': 10.183}, {'end': 3368.798, 'text': 'But most of the time you will probably not need to define your own autograd operations.', 'start': 3365.477, 'duration': 3.321}, {'end': 3372.899, 'text': 'Most of the times the operations you need will mostly be already implemented for you.', 'start': 3369.238, 'duration': 3.661}, {'end': 3381.594, 'text': 'So in TensorFlow we saw if we can move to something like Keras or tf.learn, and this gives us a higher level API to work with,', 'start': 3374.167, 'duration': 7.427}, {'end': 3383.315, 'text': 'rather than this raw computational graphs.', 'start': 3381.594, 'duration': 1.721}, {'end': 3390.642, 'text': 'The equivalent in PyTorch is the NN package where it provides these high level wrappers for working with these things.', 'start': 3383.896, 'duration': 6.746}, {'end': 3396.287, 'text': "But unlike TensorFlow there's only one of them and it works pretty well so just use that if you're using PyTorch.", 'start': 3392.003, 'duration': 4.284}], 'summary': 'Pytorch allows manual and automatic gradient computation, gpu support, and provides high-level wrappers for neural network operations.', 'duration': 214.888, 'max_score': 3181.399, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3181399.jpg'}], 'start': 2824.479, 'title': 'Higher level libraries in tensorflow', 'summary': "Discusses the use of higher level libraries such as tf.layers and keras in tensorflow, highlighting the convenience and popularity of keras as an alternative, and also covers tensorflow and pytorch comparison, contrasting the design and features while emphasizing pytorch's ease of use and gpu capabilities.", 'chapters': [{'end': 2945.454, 'start': 2824.479, 'title': 'Higher level libraries in tensorflow', 'summary': 'Discusses the use of higher level libraries such as tf.layers and keras to simplify the building of computational graphs and training procedures in tensorflow, emphasizing the convenience and popularity of keras as an alternative to tf.layers.', 'duration': 120.975, 'highlights': ['Keras is a popular high-level API that simplifies building computational graphs and training procedures in TensorFlow Keras sits on top of TensorFlow, handling the building of computational graphs and training procedures, providing a more convenient and user-friendly approach.', 'tf.layers provides a convenient way to build models in TensorFlow without handling explicit bookkeeping details Using tf.layers allows the construction of models without dealing with explicit bookkeeping details, providing a more convenient method.', 'There are various higher level libraries built on top of TensorFlow to work at a higher layer of abstraction Different higher level libraries are developed to help work at a higher layer of abstraction, simplifying the use of computational graphs in TensorFlow.']}, {'end': 3518.155, 'start': 2947.39, 'title': 'Tensorflow and pytorch comparison', 'summary': "Discusses the high-level tensorflow wrappers including keras, tf.layers, tf.slim, tf.contrib.learn, prettytensor, and sonnet, while also covering the features of tensorflow such as pre-trained models, tensorboard, and distributed computing. it contrasts the design of tensorflow with theano, and then introduces pytorch, highlighting its three layers of abstraction and comparing them to tensorflow's components. the discussion delves into the features of pytorch tensors, variables, and modules, emphasizing the ease of use and the gpu capabilities. it concludes with a comparison of pytorch's nn package with keras in terms of high-level wrappers and provides insights into defining custom autograd functions and nn modules in pytorch.", 'duration': 570.765, 'highlights': ["TensorFlow's high-level wrappers include Keras, tf.layers, tf.slim, tf.contrib.learn, and PrettyTensor, and DeepMind created their own called Sonnet. The chapter discusses the various higher level TensorFlow wrappers such as Keras, tf.layers, tf.slim, tf.contrib.learn, and PrettyTensor, and highlights the introduction of DeepMind's high-level TensorFlow wrapper called Sonnet.", 'TensorFlow offers features such as pre-trained models, TFSlim, and TensorBoard for instrumentation during the training process. The chapter mentions the availability of pre-trained models in TensorFlow, the examples in TFSlim and Keras, and the functionality of TensorBoard for adding instrumentation to code and plotting losses during the training process.', "TensorFlow allows distributed computing, although it is mainly used within Google. TensorFlow's capability for running distributed computing is mentioned, with the observation that it is primarily utilized within Google.", "PyTorch provides three layers of abstraction: tensor object, variable object, and module object, each serving specific purposes in building computational graphs and neural networks. The detailed description of PyTorch's three layers of abstraction, including the tensor object, variable object, and module object, is provided to highlight their roles in constructing computational graphs and neural networks.", 'PyTorch tensors are similar to NumPy arrays and can be used to perform operations and computational tasks, with the added advantage of running on GPUs. The chapter emphasizes the similarity of PyTorch tensors to NumPy arrays and their ability to execute operations and computational tasks, particularly on GPUs.', 'PyTorch variables enable the building of computational graphs and automatic gradient computation, providing an imperative approach similar to using tensors. The functionality of PyTorch variables in constructing computational graphs and automatically computing gradients is highlighted, with a comparison to the imperative approach of using tensors.', "PyTorch's NN package serves as a higher-level wrapper for working with neural networks, offering simplicity and ease of use compared to TensorFlow's various high-level wrappers. The comparison between PyTorch's NN package and TensorFlow's high-level wrappers is discussed, emphasizing the simplicity and efficacy of using PyTorch's NN package for working with neural networks.", 'PyTorch allows the creation of custom autograd functions and NN modules, providing flexibility and control over model design and operations. The chapter details the capability of creating custom autograd functions and NN modules in PyTorch, highlighting the flexibility and control offered for designing models and operations.']}], 'duration': 693.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc2824479.jpg', 'highlights': ['Keras simplifies building computational graphs and training procedures in TensorFlow.', 'tf.layers provides a convenient way to build models in TensorFlow without explicit bookkeeping details.', 'Higher level libraries are developed to work at a higher layer of abstraction in TensorFlow.', 'TensorFlow offers high-level wrappers like Keras, tf.layers, tf.slim, tf.contrib.learn, and PrettyTensor.', 'TensorFlow provides features such as pre-trained models, TFSlim, and TensorBoard for instrumentation.', 'TensorFlow allows distributed computing, primarily used within Google.', 'PyTorch provides three layers of abstraction: tensor object, variable object, and module object.', 'PyTorch tensors are similar to NumPy arrays and can be used for operations and computational tasks, running on GPUs.', 'PyTorch variables enable building computational graphs and automatic gradient computation, similar to using tensors.', "PyTorch's NN package serves as a higher-level wrapper for working with neural networks, offering simplicity and ease of use.", 'PyTorch allows the creation of custom autograd functions and NN modules, providing flexibility and control over model design.']}, {'end': 4055.711, 'segs': [{'end': 3572.949, 'src': 'embed', 'start': 3534.828, 'weight': 1, 'content': [{'end': 3543.556, 'text': 'So this is like relatively characteristic of what you might see in a lot of PyTorch-type training scenarios, where you define your own class,', 'start': 3534.828, 'duration': 8.728}, {'end': 3549.481, 'text': 'defining your model that contains other modules and whatnot, and then you have some explicit training loop like this that runs it and updates it.', 'start': 3543.556, 'duration': 5.925}, {'end': 3555.649, 'text': 'One kind of nice quality of life thing that you have in PyTorch is a data loader.', 'start': 3551.583, 'duration': 4.066}, {'end': 3559.173, 'text': 'So a data loader can handle building mini-batches for you.', 'start': 3556.39, 'duration': 2.783}, {'end': 3562.237, 'text': 'It can handle some of the multi-threading that we talked about for you,', 'start': 3559.514, 'duration': 2.723}, {'end': 3566.663, 'text': 'where it can actually use multiple threads in the background to build mini-batches for you and stream off disk.', 'start': 3562.237, 'duration': 4.426}, {'end': 3572.949, 'text': 'So here, a data loader wraps a data set and provides some of these abstractions for you.', 'start': 3567.905, 'duration': 5.044}], 'summary': 'Pytorch training scenarios involve defining models, using data loaders for multi-threading, and building mini-batches.', 'duration': 38.121, 'max_score': 3534.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3534828.jpg'}, {'end': 3631.618, 'src': 'embed', 'start': 3605.984, 'weight': 0, 'content': [{'end': 3610.829, 'text': "PyTorch provides pre-trained models, and this is probably the slickest pre-trained model experience I've ever seen.", 'start': 3605.984, 'duration': 4.845}, {'end': 3613.652, 'text': 'You just say torchvision.models.alexnet.', 'start': 3611.189, 'duration': 2.463}, {'end': 3614.353, 'text': 'pre-trained equals.', 'start': 3613.652, 'duration': 0.701}, {'end': 3616.375, 'text': "true, and that'll go down in the background.", 'start': 3614.353, 'duration': 2.022}, {'end': 3620.459, 'text': "download the pre-trained weights for you if you don't already have them, and then it's right there, you're good to go.", 'start': 3616.375, 'duration': 4.084}, {'end': 3622.121, 'text': 'So this is super easy to use.', 'start': 3620.899, 'duration': 1.222}, {'end': 3631.618, 'text': "PyTorch also has, there's also a package called Visdom that lets you visualize some of these loss statistics, somewhat similar to TensorBoard.", 'start': 3624.275, 'duration': 7.343}], 'summary': 'Pytorch offers seamless pre-trained model experience, with easy access to tools like visdom for visualization.', 'duration': 25.634, 'max_score': 3605.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3605984.jpg'}, {'end': 3710.446, 'src': 'embed', 'start': 3686.315, 'weight': 2, 'content': [{'end': 3693.198, 'text': 'But kind of the high level differences between Torch and PyTorch are that Torch is actually in Lua, not Python, unlike these other things.', 'start': 3686.315, 'duration': 6.883}, {'end': 3695.619, 'text': 'So learning Lua is a bit of a turn off for some people.', 'start': 3693.358, 'duration': 2.261}, {'end': 3699.3, 'text': "Torch doesn't have autograd.", 'start': 3697.58, 'duration': 1.72}, {'end': 3702.682, 'text': "Torch is also older, so it's more stable, less susceptible to bugs.", 'start': 3699.721, 'duration': 2.961}, {'end': 3704.483, 'text': "There's maybe more example code for Torch.", 'start': 3702.722, 'duration': 1.761}, {'end': 3707.444, 'text': "They're about the same speeds, that's not really a concern.", 'start': 3705.563, 'duration': 1.881}, {'end': 3710.446, 'text': "But in PyTorch, it's in Python, which is great.", 'start': 3708.004, 'duration': 2.442}], 'summary': 'Torch in lua, more stable, less bugs; pytorch in python, great for beginners.', 'duration': 24.131, 'max_score': 3686.315, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3686315.jpg'}, {'end': 3806.609, 'src': 'embed', 'start': 3780.464, 'weight': 4, 'content': [{'end': 3785.229, 'text': 'But I do want to talk a bit about some of the implications of static versus dynamic and what are the trade-offs of those two.', 'start': 3780.464, 'duration': 4.765}, {'end': 3795.618, 'text': "So one kind of nice idea with static graphs is that because we're kind of building up one computational graph once and then reusing it many times,", 'start': 3787.15, 'duration': 8.468}, {'end': 3800.483, 'text': 'the framework might have the opportunity to go in and do optimizations on that graph and kind of fuse.', 'start': 3795.618, 'duration': 4.865}, {'end': 3806.609, 'text': 'some operations, reorder some operations, figure out the most efficient way to operate that graph, so it can be really efficient.', 'start': 3800.483, 'duration': 6.126}], 'summary': 'Static graphs allow for optimizations and efficiency due to reuse and computational graph building.', 'duration': 26.145, 'max_score': 3780.464, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3780464.jpg'}], 'start': 3518.415, 'title': 'Pytorch training essentials and features', 'summary': 'Covers pytorch training basics, including defining a model, implementing a training loop, and utilizing data loaders, as well as discussing pytorch features such as pre-trained models, visdom visualization package, comparisons with torch, and implications of static versus dynamic graphs.', 'chapters': [{'end': 3603.54, 'start': 3518.415, 'title': 'Pytorch training basics', 'summary': 'Introduces the typical components of a pytorch training scenario, including defining a model, implementing a training loop, and utilizing data loaders to handle mini-batch creation and multi-threading.', 'duration': 85.125, 'highlights': ['PyTorch training involves defining a model, implementing a training loop, and utilizing data loaders to handle mini-batch creation and multi-threading.', 'A data loader in PyTorch can handle building mini-batches and multi-threading, providing abstractions for the user.', 'Users typically write their own data set class to read their particular type of data and then wrap it in a data loader for training.']}, {'end': 4055.711, 'start': 3605.984, 'title': 'Pytorch features and comparison with torch', 'summary': 'Discusses the ease of using pre-trained models in pytorch, the visualization package visdom, comparisons between pytorch and torch, and the implications of static versus dynamic graphs, highlighting the advantages and trade-offs of each approach.', 'duration': 449.727, 'highlights': ['PyTorch provides a slick pre-trained model experience, with easy access to pre-trained weights and simplicity of use. Using torchvision.models.alexnet.pre-trained equals true provides a smooth experience, downloading pre-trained weights in the background and ensuring ease of use.', 'Comparison between TensorBoard and Visdom, highlighting the visualization capabilities and the absence of computational graph visualization in Visdom. While Visdom allows visualization of loss statistics similar to TensorBoard, it lacks the feature to visualize the structure of the computational graph, which is a significant distinction.', 'Comparison between Torch and PyTorch, emphasizing the differences in language, autograd, stability, and existing code, leading to a preference for PyTorch due to its advantages. The comparison outlines the differences between Torch and PyTorch, including language, autograd, stability, and existing code, ultimately favoring PyTorch for its benefits.', 'Discussion of static versus dynamic graphs, highlighting the trade-offs and implications, including optimization opportunities and serialization advantages of static graphs. The section delves into the implications of static versus dynamic graphs, discussing the optimization potential and serialization advantages of static graphs in contrast to the cleaner code and conditional operation simplicity provided by dynamic graphs.']}], 'duration': 537.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc3518415.jpg', 'highlights': ['PyTorch provides a slick pre-trained model experience, with easy access to pre-trained weights and simplicity of use.', 'PyTorch training involves defining a model, implementing a training loop, and utilizing data loaders to handle mini-batch creation and multi-threading.', 'Comparison between Torch and PyTorch, emphasizing the differences in language, autograd, stability, and existing code, leading to a preference for PyTorch due to its advantages.', 'A data loader in PyTorch can handle building mini-batches and multi-threading, providing abstractions for the user.', 'Discussion of static versus dynamic graphs, highlighting the trade-offs and implications, including optimization opportunities and serialization advantages of static graphs.']}, {'end': 4677.054, 'segs': [{'end': 4164.142, 'src': 'embed', 'start': 4138.841, 'weight': 1, 'content': [{'end': 4145.483, 'text': 'And I kind of like that using PyTorch dynamic graphs, you can just use your favorite imperative programming constructs and it all works just fine.', 'start': 4138.841, 'duration': 6.642}, {'end': 4153.538, 'text': 'By the way, there actually is some very new library called TensorFlow Fold,', 'start': 4147.877, 'duration': 5.661}, {'end': 4159.56, 'text': 'which is kind of another one of these layers on top of TensorFlow that lets you implement dynamic graphs.', 'start': 4153.538, 'duration': 6.022}, {'end': 4164.142, 'text': 'You kind of write your own code using TensorFlow Fold.', 'start': 4161.06, 'duration': 3.082}], 'summary': 'Pytorch allows using imperative programming constructs, while tensorflow fold enables implementing dynamic graphs.', 'duration': 25.301, 'max_score': 4138.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4138841.jpg'}, {'end': 4283.069, 'src': 'embed', 'start': 4259.03, 'weight': 0, 'content': [{'end': 4264.994, 'text': 'so the structure of the computational graph then kind of mirrors the structure of the input data and it could vary from data point to data point.', 'start': 4259.03, 'duration': 5.964}, {'end': 4270.518, 'text': 'So this type of thing seems kind of complicated and hairy to implement using TensorFlow,', 'start': 4265.855, 'duration': 4.663}, {'end': 4274.501, 'text': "but in PyTorch you can just kind of use normal Python control flow and it'll work out just fine.", 'start': 4270.518, 'duration': 3.983}, {'end': 4283.069, 'text': 'Another bit of more researchy application is this really cool idea that I like called neural module networks for visual question answering.', 'start': 4276.403, 'duration': 6.666}], 'summary': 'Pytorch allows simpler implementation of complex computational graphs compared to tensorflow.', 'duration': 24.039, 'max_score': 4259.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4259030.jpg'}, {'end': 4361.062, 'src': 'embed', 'start': 4332.409, 'weight': 2, 'content': [{'end': 4339.593, 'text': "But as kind of a bigger point, I think that there's a lot of cool creative applications that people could do with dynamic computational graphs,", 'start': 4332.409, 'duration': 7.184}, {'end': 4343.575, 'text': "and maybe there aren't so many right now, just because it's been so painful to work with them.", 'start': 4339.593, 'duration': 3.982}, {'end': 4350.298, 'text': "So I think that there's a lot of opportunity for doing cool, creative things with dynamic computational graphs.", 'start': 4344.095, 'duration': 6.203}, {'end': 4353.36, 'text': "And maybe if you come up with cool ideas, we'll feature it in lecture next year.", 'start': 4350.698, 'duration': 2.662}, {'end': 4361.062, 'text': 'So I wanted to talk very briefly about CAFE, which is this framework from Berkeley,', 'start': 4355.116, 'duration': 5.946}], 'summary': 'Opportunity for creative applications with dynamic computational graphs from cafe framework.', 'duration': 28.653, 'max_score': 4332.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4332409.jpg'}, {'end': 4522.473, 'src': 'embed', 'start': 4492.31, 'weight': 3, 'content': [{'end': 4495.052, 'text': 'I promise one slide, one or two slides on CafeTwo.', 'start': 4492.31, 'duration': 2.742}, {'end': 4499.115, 'text': 'So CafeTwo is this successor to Cafe, which is from Facebook.', 'start': 4495.753, 'duration': 3.362}, {'end': 4502.277, 'text': "It's super new, it was only released a week ago.", 'start': 4500.395, 'duration': 1.882}, {'end': 4508.901, 'text': "So I really haven't had the time to form a super educated opinion about CafeTwo yet.", 'start': 4504.318, 'duration': 4.583}, {'end': 4512.644, 'text': 'but it uses static graphs, kind of similar to TensorFlow.', 'start': 4510.022, 'duration': 2.622}, {'end': 4517.448, 'text': 'Kind of like Cafe1, the core is written in C++ and they have some Python interface.', 'start': 4513.545, 'duration': 3.903}, {'end': 4522.473, 'text': 'The difference is that now you no longer need to write your own Python scripts to generate proto.txt files.', 'start': 4517.889, 'duration': 4.584}], 'summary': 'Cafetwo, the successor to cafe by facebook, uses static graphs and was released a week ago.', 'duration': 30.163, 'max_score': 4492.31, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4492310.jpg'}, {'end': 4671.629, 'src': 'embed', 'start': 4636.695, 'weight': 4, 'content': [{'end': 4640.117, 'text': "If you're focused on just writing research code, I think PyTorch is a great choice.", 'start': 4636.695, 'duration': 3.422}, {'end': 4646.243, 'text': "but it's a bit newer, it has less community support, less code out there, so it could be a bit of an adventure.", 'start': 4641.497, 'duration': 4.746}, {'end': 4649.327, 'text': 'If you want more of a well-trodden path, TensorFlow might be a better choice.', 'start': 4646.523, 'duration': 2.804}, {'end': 4654.854, 'text': "If you're interested in production deployment, you should probably look at Caffe, Caffe2, or TensorFlow.", 'start': 4650.628, 'duration': 4.226}, {'end': 4660.681, 'text': "And if you're really focused on mobile deployment, I think TensorFlow and Caffe2 both have some built-in support for that.", 'start': 4655.234, 'duration': 5.447}, {'end': 4664.544, 'text': "So it's kind of unfortunate there's not just like one global best framework.", 'start': 4661.241, 'duration': 3.303}, {'end': 4668.687, 'text': "It kind of depends on what you're actually trying to do, what applications you anticipate.", 'start': 4664.584, 'duration': 4.103}, {'end': 4671.629, 'text': 'But these are kind of my general advice on those things.', 'start': 4669.247, 'duration': 2.382}], 'summary': 'Pytorch for research, tensorflow for production, caffe/caffe2 for deployment, no single best framework.', 'duration': 34.934, 'max_score': 4636.695, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4636695.jpg'}], 'start': 4056.031, 'title': 'Tensorflow and deep learning frameworks', 'summary': "Delves into tensorflow's challenges with dynamic graphs, functional programming constructs, and potential applications, and compares caffe, caffe2, tensorflow, and pytorch, highlighting their use cases and suggesting tensorflow for versatility, pytorch for research, and caffe/caffe2 for production and mobile deployment.", 'chapters': [{'end': 4350.298, 'start': 4056.031, 'title': 'Tensorflow dynamic graphs', 'summary': 'Discusses the challenges of working with dynamic computational graphs in tensorflow, the need for functional programming constructs, the comparison with pytorch, and potential applications like recurrent networks, recursive networks, and neural module networks for visual question answering.', 'duration': 294.267, 'highlights': ['The challenges of working with dynamic computational graphs in TensorFlow and the need for functional programming constructs. In TensorFlow, constructing dynamic computational graphs requires using functional programming operators to implement looping constructs, which can be confusing and hard to work with, compared to the more straightforward approach in PyTorch.', "Comparison with PyTorch's dynamic graphs and the ease of using normal Python control flow for imperative programming constructs. PyTorch's dynamic graphs allow the use of favorite imperative programming constructs with ease, unlike TensorFlow, which requires relearning a separate set of control flow operators.", 'Applications of dynamic graphs in recurrent networks, recursive networks, and neural module networks for visual question answering. Dynamic graphs find applications in recurrent networks for handling sequences of varying lengths, recursive networks for operating over graph or tree structures, and neural module networks for creating custom architectures based on input data, showing potential for cool, creative applications.']}, {'end': 4677.054, 'start': 4350.698, 'title': 'Choosing deep learning frameworks', 'summary': 'Discusses the differences between caffe, caffe2, tensorflow, and pytorch, emphasizing their use cases and suitability for research, production, and mobile deployment, suggesting tensorflow for versatility, pytorch for research, and caffe/caffe2 for production and mobile deployment.', 'duration': 326.356, 'highlights': ['Caffe2, TensorFlow, and PyTorch are discussed, emphasizing their use cases for research, production, and mobile deployment. The chapter discusses the differences between Caffe, Caffe2, TensorFlow, and PyTorch, emphasizing their use cases and suitability for research, production, and mobile deployment.', "Caffe2's use of static graphs, Python interface, and focus on production-oriented use cases is mentioned. Caffe2 utilizes static graphs, a Python interface, and is geared towards production-oriented use cases.", 'PyTorch is recommended for research due to its ease of use, while TensorFlow is suggested as a versatile choice for various projects, albeit needing a higher level wrapper. PyTorch is recommended for research due to its ease of use, while TensorFlow is suggested as a versatile choice for various projects, albeit needing a higher level wrapper.', 'Caffe and Caffe2 are highlighted as suitable for production scenarios and mobile deployment, while TensorFlow is mentioned as a safe bet for any project. Caffe and Caffe2 are highlighted as suitable for production scenarios and mobile deployment, while TensorFlow is mentioned as a safe bet for any project.']}], 'duration': 621.023, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6SlgtELqOWc/pics/6SlgtELqOWc4056031.jpg', 'highlights': ['Dynamic computational graphs in TensorFlow require functional programming constructs for implementing looping, which can be confusing compared to PyTorch.', "PyTorch's dynamic graphs allow the use of favorite imperative programming constructs with ease, unlike TensorFlow.", 'Dynamic graphs find applications in recurrent networks, recursive networks, and neural module networks, showing potential for cool, creative applications.', 'Caffe2 utilizes static graphs, a Python interface, and is geared towards production-oriented use cases.', 'PyTorch is recommended for research due to its ease of use, while TensorFlow is suggested as a versatile choice for various projects.', 'Caffe and Caffe2 are highlighted as suitable for production scenarios and mobile deployment, while TensorFlow is mentioned as a safe bet for any project.']}], 'highlights': ['The lecture covers deep learning software, a topic that changes significantly every year.', 'Project proposals for course projects were due on Tuesday, and TAs are being assigned based on project areas and expertise.', 'Assignment two is due a week from today, next Thursday.', 'For assignment two, conserve credits by using GPU instances only for the last notebook, as the first several notebooks only require Python and NumPy without the need for GPUs.', "It's important to stop Google Cloud instances when not working to preserve credits.", 'GPUs are well suited for matrix multiplication and convolution due to their ability to parallelize computations for faster throughput, especially with large matrices.', 'The Nvidia Titan XP has 3,840 cores, enabling faster computation for massively parallel problems.', 'GPUs have their own RAM built into the chip, with the Nvidia Titan XP having 12 gigabytes of memory, enabling efficient communication and processing for parallel tasks.', 'GPUs offer a huge speed advantage over CPUs in deep learning, with a 65 to 75 times speedup for certain computations compared to top-of-the-line CPUs.', 'NVIDIA provides highly optimized libraries, such as cuBLAST and cuDNN, implementing computational primitives for GPUs, resulting in a 65 to 75 times speedup for certain computations compared to top-of-the-line CPUs.', 'Nearly a 3x speedup is observed when using cuDNN libraries compared to handwritten, less optimized CUDA implementations.', "Strategies to mitigate bottlenecks involve prefetching data on the CPU, using SSD instead of a hard drive, and reading the entire dataset into RAM if it's small.", 'The rise in popularity of TensorFlow, becoming the main framework of choice for many people in the last year', 'The emergence of new frameworks like Cafe2 and PyTorch, developed by Facebook, and other frameworks like Paddle, CNTK, and MXNet', 'The shift in deep learning frameworks from academia to industry, with the next generation frameworks originating in industry', 'Deep learning frameworks enable building and working with complex computational graphs, automatically computing gradients, and efficiently running on GPUs', 'The process of building and running a computational graph in TensorFlow involves defining symbolic variables, performing operations, and using TensorFlow sessions to execute the graph with concrete values for computation', 'Variables in TensorFlow are used to persist values inside the computational graph and need to be explicitly initialized and updated within the graph', 'The usage of tf.train.gradientdescentoptimizer and its ability to perform gradient computation and update operations internally, providing convenience and automation in the optimization process', 'The introduction of tf.layers.dense as a higher-level library for handling architectural details, such as setting up variables for weights and biases, utilizing Xavier initializer for initialization, and performing activation functions like ReLU, streamlining the process of building neural networks and improving convergence speed', 'Keras simplifies building computational graphs and training procedures in TensorFlow.', 'tf.layers provides a convenient way to build models in TensorFlow without explicit bookkeeping details.', 'Higher level libraries are developed to work at a higher layer of abstraction in TensorFlow.', 'TensorFlow offers high-level wrappers like Keras, tf.layers, tf.slim, tf.contrib.learn, and PrettyTensor.', 'PyTorch provides three layers of abstraction: tensor object, variable object, and module object.', 'PyTorch tensors are similar to NumPy arrays and can be used for operations and computational tasks, running on GPUs.', 'PyTorch variables enable building computational graphs and automatic gradient computation, similar to using tensors.', "PyTorch's NN package serves as a higher-level wrapper for working with neural networks, offering simplicity and ease of use.", 'PyTorch provides a slick pre-trained model experience, with easy access to pre-trained weights and simplicity of use.', 'PyTorch training involves defining a model, implementing a training loop, and utilizing data loaders to handle mini-batch creation and multi-threading.', 'Comparison between Torch and PyTorch, emphasizing the differences in language, autograd, stability, and existing code, leading to a preference for PyTorch due to its advantages.', 'A data loader in PyTorch can handle building mini-batches and multi-threading, providing abstractions for the user.', 'Dynamic computational graphs in TensorFlow require functional programming constructs for implementing looping, which can be confusing compared to PyTorch.', "PyTorch's dynamic graphs allow the use of favorite imperative programming constructs with ease, unlike TensorFlow.", 'Dynamic graphs find applications in recurrent networks, recursive networks, and neural module networks, showing potential for cool, creative applications.', 'Caffe2 utilizes static graphs, a Python interface, and is geared towards production-oriented use cases.', 'PyTorch is recommended for research due to its ease of use, while TensorFlow is suggested as a versatile choice for various projects.', 'Caffe and Caffe2 are highlighted as suitable for production scenarios and mobile deployment, while TensorFlow is mentioned as a safe bet for any project.']}