title
Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 1 – Introduction and Word Vectors
description
For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/3CORGu1
This lecture covers many topics within Natural Language Understanding, including:
-The Course (10 min)
-Human language and word meaning (15 min)
-Wordzvec introductions (15 min)
-WordZvec objective function gradients (25 min)
-Optimization basics (5 min)
-Looking at word vectors (10 min for less)
Professor Christopher Manning
Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer Science
Director, Stanford Artificial Intelligence Laboratory (SAIL)
To follow along with the course schedule and syllabus, visit: http://web.stanford.edu/class/cs224n/index.html#schedule
0:00 Introduction
00:41 Welcome
01:31 Overview for the lecture
01:56 Lecture Plan & Overview
02:02 Course logistics in brief
02:52 What do we hope to teach in this course?
05:39 Course work and grading policy
07:02 High-level plan for problem sets
#ChristopherManning #naturallanguageprocessing #deeplearning
detail
{'title': 'Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 1 – Introduction and Word Vectors', 'heatmap': [{'end': 2705.033, 'start': 2497.79, 'weight': 0.921}, {'end': 2997.591, 'start': 2796.365, 'weight': 0.71}, {'end': 4371.938, 'start': 4318.848, 'weight': 0.874}], 'summary': 'Lecture series covers nlp with deep learning, including word vectors and the word2vec algorithm, and aims to teach effective deep learning methods for nlp, language evolution, distributed word representations, word vector calculation and optimization, calculus in nlp, and word representation improvement using data science tools.', 'chapters': [{'end': 509.9, 'segs': [{'end': 71.926, 'src': 'embed', 'start': 27.024, 'weight': 6, 'content': [{'end': 34.689, 'text': 'some people could sort of squeeze towards the edges and make more accessible some of the seats that still exist in the classroom.', 'start': 27.024, 'duration': 7.665}, {'end': 42.807, 'text': "Okay Um, so, um, you know, it's really exciting and great to see so many people here.", 'start': 36.542, 'duration': 6.265}, {'end': 52.253, 'text': 'So, um, a hearty welcome to CS224N, occasionally also known as Ling284, which is Natural Language Processing with Deep Learning.', 'start': 42.947, 'duration': 9.306}, {'end': 59.558, 'text': 'Um, as just a sort of a personal anecdote, it still sort of blows my mind that so many people turn up to this class these days.', 'start': 52.273, 'duration': 7.285}, {'end': 67.303, 'text': 'So, for about the first decade that I taught NLP here, you know, the number of people I got each year was approximately 45.', 'start': 59.938, 'duration': 7.365}, {'end': 71.926, 'text': "Um. so it's an order of magnitude smaller than it is now,", 'start': 67.303, 'duration': 4.623}], 'summary': 'Cs224n class attendance has increased by an order of magnitude over the years, from 45 to a larger number.', 'duration': 44.902, 'max_score': 27.024, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo27024.jpg'}, {'end': 120.417, 'src': 'embed', 'start': 97.903, 'weight': 5, 'content': [{'end': 106.388, 'text': 'um very brief discussion and talk about um human language and word meaning, and then we want to get right into talking about um,', 'start': 97.903, 'duration': 8.485}, {'end': 112.372, 'text': "the first thing that we're doing, which is coming up with word vectors and looking at the word2vec algorithm,", 'start': 106.388, 'duration': 5.984}, {'end': 114.613, 'text': "and that'll then sort of fill up the rest of the class.", 'start': 112.372, 'duration': 2.241}, {'end': 120.417, 'text': 'There are still two seats right in the front row for someone who wants to sit right in front of me, just letting you know.', 'start': 115.234, 'duration': 5.183}], 'summary': 'Discussion on human language, word vectors, and word2vec algorithm in class', 'duration': 22.514, 'max_score': 97.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo97903.jpg'}, {'end': 208.627, 'src': 'embed', 'start': 182.618, 'weight': 0, 'content': [{'end': 188.82, 'text': 'starting off by reviewing some of the basics and then particularly talking about the kinds of techniques, including um,', 'start': 182.618, 'duration': 6.202}, {'end': 193.842, 'text': 'recurrent networks and attention, that are widely used for natural language processing models.', 'start': 188.82, 'duration': 5.022}, {'end': 202.805, 'text': 'Um, a second thing we wanna teach is a big picture understanding of human languages and some of the difficulties in understanding and producing them.', 'start': 194.722, 'duration': 8.083}, {'end': 208.627, 'text': "Of course, if you wanna know a lot about human languages, there's a whole linguistics department and you can do a lot of courses of that.", 'start': 203.245, 'duration': 5.382}], 'summary': 'Reviewing basics and discussing techniques for natural language processing models, along with understanding human languages and associated difficulties.', 'duration': 26.009, 'max_score': 182.618, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo182618.jpg'}, {'end': 267.28, 'src': 'embed', 'start': 242.142, 'weight': 1, 'content': [{'end': 247.906, 'text': "We're going to do word meaning dependency, parsing, machine translation and you have an option to do question answering, um,", 'start': 242.142, 'duration': 5.764}, {'end': 249.627, 'text': 'actually building systems for those.', 'start': 247.906, 'duration': 1.721}, {'end': 257.493, 'text': "Um, if you've been talking to friends who did the class in the last couple of years, um, here are the differences for this year.", 'start': 250.528, 'duration': 6.965}, {'end': 258.754, 'text': 'just to get things straight', 'start': 257.493, 'duration': 1.261}, {'end': 261.796, 'text': "Um, so we've updated some of the content of the course.", 'start': 259.154, 'duration': 2.642}, {'end': 266.139, 'text': "So, um, between, um, me and guest lectures, there's new content.", 'start': 261.815, 'duration': 4.324}, {'end': 267.28, 'text': 'What? That looked bad.', 'start': 266.379, 'duration': 0.901}], 'summary': 'Course content updated with new material and options for system building.', 'duration': 25.138, 'max_score': 242.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo242142.jpg'}, {'end': 333.663, 'src': 'embed', 'start': 302.937, 'weight': 2, 'content': [{'end': 308.221, 'text': "Um, this year we're gonna use PyTorch instead of TensorFlow, and we can talk about that more later too.", 'start': 302.937, 'duration': 5.284}, {'end': 316.086, 'text': "Um, we're having the assignments due before class on either Tuesday or Thursday, so you're not distracted and can come to class.", 'start': 308.241, 'duration': 7.845}, {'end': 326.334, 'text': "Um, so starting off, Um, yeah, so we're trying to give an easier, gentler ramp up, but on the other hand, a fast ramp up.", 'start': 316.646, 'duration': 9.688}, {'end': 333.663, 'text': "So we've got this first assignment which is sort of easy, um, but it's available right now and is due next Tuesday.", 'start': 326.614, 'duration': 7.049}], 'summary': 'Using pytorch, assignments due before class, gentle but fast ramp up, first easy assignment available now and due next tuesday.', 'duration': 30.726, 'max_score': 302.937, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo302937.jpg'}, {'end': 394.394, 'src': 'embed', 'start': 368.897, 'weight': 3, 'content': [{'end': 375.66, 'text': 'You can either do our default final project which is a good option for many people or you can do a custom final project.', 'start': 368.897, 'duration': 6.763}, {'end': 377.961, 'text': "I'll talk about that in a more in the beginning.", 'start': 375.68, 'duration': 2.281}, {'end': 380.922, 'text': 'Um, and this is not working well.', 'start': 378.401, 'duration': 2.521}, {'end': 390.21, 'text': 'Um. And so then, at the end um, we have a final poster, um presentation session at which your attendance is expected,', 'start': 381.343, 'duration': 8.867}, {'end': 394.394, 'text': "and we're gonna be having that um Wednesday in the evening.", 'start': 390.21, 'duration': 4.184}], 'summary': 'Choose between default or custom final project. attendance expected at wednesday evening presentation.', 'duration': 25.497, 'max_score': 368.897, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo368897.jpg'}, {'end': 462.28, 'src': 'embed', 'start': 433.684, 'weight': 4, 'content': [{'end': 443.733, 'text': 'Homework two is pure Python plus NumPy, but that will start to kind of teach you more about the sort of underlying how do we do deep learning.', 'start': 433.684, 'duration': 10.049}, {'end': 454.657, 'text': "if you're not so good or a bit rusty, um, or never seen, um, Python or NumPy, um, we're going to have an extra section on Friday.", 'start': 444.674, 'duration': 9.983}, {'end': 462.28, 'text': "So Friday from 1.30 to 2.50, um, in Skilling Auditorium, we'll have a section that's a Python review.", 'start': 454.778, 'duration': 7.502}], 'summary': 'Homework two focuses on python and numpy for deep learning. extra python review section on friday from 1.30 to 2.50 in skilling auditorium.', 'duration': 28.596, 'max_score': 433.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo433684.jpg'}], 'start': 5.49, 'title': 'Nlp with deep learning and course logistics', 'summary': 'Discusses the impact of ai and ml on modern society, with a focus on word vectors and the word2vec algorithm, and introduces the course logistics and goals, including teaching effective deep learning methods for nlp and practical system building.', 'chapters': [{'end': 114.613, 'start': 5.49, 'title': 'Cs224n: nlp with deep learning', 'summary': 'Discusses the significant increase in class attendance and the impact of artificial intelligence and machine learning on modern society, with the plan for the class including a brief introduction, course logistics, and a discussion on word vectors and the word2vec algorithm.', 'duration': 109.123, 'highlights': ['The number of students attending the NLP class has increased significantly, from approximately 45 to a much larger number, indicating the revolutionary impact of artificial intelligence and machine learning on modern society.', 'The plan for the class includes a brief introduction, course logistics, and a discussion on word vectors and the word2vec algorithm, indicating a focus on practical and technical aspects of natural language processing with deep learning.', 'Encouraging attendees to fill up the remaining seats by being civic-minded and making middle seats more accessible, showcasing the enthusiasm and popularity of the CS224N class.']}, {'end': 509.9, 'start': 115.234, 'title': 'Nlp and deep learning course logistics', 'summary': 'Introduces the course logistics and goals, including teaching effective deep learning methods for nlp, practical system building, and the changes in the course structure for the year, with a focus on pytorch and microsoft azure gpu computing.', 'duration': 394.666, 'highlights': ['The course aims to teach effective, modern methods for deep learning and techniques like recurrent networks and attention for NLP models.', 'The practical aspect of the course involves teaching students how to build systems for NLP problems like word meaning, dependency parsing, and machine translation.', 'The course has been updated with new content and will use PyTorch instead of TensorFlow, with a shift to five one-week assignments and no midterm, with a focus on using Microsoft Azure for GPU computing.', 'The grading system involves five assignments, a final project with custom options, and a participation component, emphasizing original work and adherence to collaboration policies.', 'The problem sets will include assignments using Python, NumPy, and PyTorch, with additional support provided for Python and NumPy review sessions and a shift to GPU computing using Microsoft Azure.']}], 'duration': 504.41, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo5490.jpg', 'highlights': ['The course aims to teach effective, modern methods for deep learning and techniques like recurrent networks and attention for NLP models.', 'The practical aspect of the course involves teaching students how to build systems for NLP problems like word meaning, dependency parsing, and machine translation.', 'The course has been updated with new content and will use PyTorch instead of TensorFlow, with a shift to five one-week assignments and no midterm, with a focus on using Microsoft Azure for GPU computing.', 'The grading system involves five assignments, a final project with custom options, and a participation component, emphasizing original work and adherence to collaboration policies.', 'The problem sets will include assignments using Python, NumPy, and PyTorch, with additional support provided for Python and NumPy review sessions and a shift to GPU computing using Microsoft Azure.', 'The plan for the class includes a brief introduction, course logistics, and a discussion on word vectors and the word2vec algorithm, indicating a focus on practical and technical aspects of natural language processing with deep learning.', 'The number of students attending the NLP class has increased significantly, from approximately 45 to a much larger number, indicating the revolutionary impact of artificial intelligence and machine learning on modern society.', 'Encouraging attendees to fill up the remaining seats by being civic-minded and making middle seats more accessible, showcasing the enthusiasm and popularity of the CS224N class.']}, {'end': 2058.675, 'segs': [{'end': 551.394, 'src': 'embed', 'start': 509.9, 'weight': 0, 'content': [{'end': 513.381, 'text': 'TensorFlow Chain or MXNet, um, etc.', 'start': 509.9, 'duration': 3.481}, {'end': 516.482, 'text': 'and then doing the computing on GPUs.', 'start': 513.381, 'duration': 3.101}, {'end': 521.344, 'text': "So, of course, since we're in the Huang building, we should, of course, be using, um, GPUs.", 'start': 516.501, 'duration': 4.843}, {'end': 528.506, 'text': "But, I mean, in general, the sort of parallelism scalability of GPUs is what's powered, um, most of modern deep learning.", 'start': 521.404, 'duration': 7.102}, {'end': 530.507, 'text': 'Okay The final project.', 'start': 529.026, 'duration': 1.481}, {'end': 534.908, 'text': 'So for the final project, um, there are two things that you can do.', 'start': 530.847, 'duration': 4.061}, {'end': 540.77, 'text': 'Um, so we have a default final project, which is essentially our final project, in a box,', 'start': 534.928, 'duration': 5.842}, {'end': 546.052, 'text': 'and so this is building a question answering system and we do it over the squad dataset.', 'start': 540.77, 'duration': 5.282}, {'end': 551.394, 'text': 'So what you build and how you can improve your performance is completely up to you.', 'start': 546.333, 'duration': 5.061}], 'summary': 'Use gpus for parallel computing in deep learning; final project: question answering system over squad dataset.', 'duration': 41.494, 'max_score': 509.9, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo509900.jpg'}, {'end': 600.898, 'src': 'embed', 'start': 572.511, 'weight': 2, 'content': [{'end': 575.952, 'text': 'We will give you feedback, um, from someone as a mentor.', 'start': 572.511, 'duration': 3.441}, {'end': 582.154, 'text': 'Um, and either way, for only the final project, we allow teams of one, two, or three.', 'start': 576.373, 'duration': 5.781}, {'end': 585.014, 'text': "For the homeworks, you're expected to do them yourself.", 'start': 582.494, 'duration': 2.52}, {'end': 589.215, 'text': 'Though, of course, you can chat to people in a general way about the problems.', 'start': 585.154, 'duration': 4.061}, {'end': 592.516, 'text': 'Okay So that is the course.', 'start': 590.696, 'duration': 1.82}, {'end': 595.617, 'text': "All good? I'm not even behind schedule yet.", 'start': 593.316, 'duration': 2.301}, {'end': 600.898, 'text': 'Okay Um, so the next section is human language and word meaning.', 'start': 595.797, 'duration': 5.101}], 'summary': 'Course allows teams of one, two, or three for final project. homeworks expected to be done individually.', 'duration': 28.387, 'max_score': 572.511, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo572511.jpg'}, {'end': 691.651, 'src': 'embed', 'start': 668.56, 'weight': 3, 'content': [{'end': 679.447, 'text': 'That language is this amazing um human created system that is used for all sorts of purposes and is adaptable to all sorts of purposes.', 'start': 668.56, 'duration': 10.887}, {'end': 688.47, 'text': 'So you can do everything from describing mathematics in human language um to sort of nuzzling up to your best friend and getting them to understand you better.', 'start': 679.507, 'duration': 8.963}, {'end': 690.611, 'text': "So there's actually an amazing thing of human language.", 'start': 688.55, 'duration': 2.061}, {'end': 691.651, 'text': "Anyway, I'll just read it.", 'start': 690.771, 'duration': 0.88}], 'summary': 'Human language is an adaptable system used for diverse purposes, from describing mathematics to improving communication with friends.', 'duration': 23.091, 'max_score': 668.56, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo668560.jpg'}, {'end': 740.353, 'src': 'embed', 'start': 714.223, 'weight': 4, 'content': [{'end': 719.087, 'text': 'trying in vain to connect with one another by blindly flinging words out into the darkness.', 'start': 714.223, 'duration': 4.864}, {'end': 727.376, 'text': 'Every choice of phrasing and spelling and tone and timing carries countless signals and context and subtext and more.', 'start': 719.947, 'duration': 7.429}, {'end': 731.101, 'text': 'And every listener interprets those signals in their own way.', 'start': 727.997, 'duration': 3.104}, {'end': 733.344, 'text': "Language isn't a formal system.", 'start': 731.682, 'duration': 1.662}, {'end': 735.446, 'text': 'Language is glorious chaos.', 'start': 733.724, 'duration': 1.722}, {'end': 740.353, 'text': 'You can never know for sure what any words will mean to anyone.', 'start': 736.772, 'duration': 3.581}], 'summary': 'Language is a chaotic system with countless signals and interpretations.', 'duration': 26.13, 'max_score': 714.223, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo714223.jpg'}, {'end': 877.675, 'src': 'embed', 'start': 832.592, 'weight': 5, 'content': [{'end': 843.94, 'text': 'And if you think about how we sort of, convey knowledge around the place in our human world, mainly the way we do it is through human language.', 'start': 832.592, 'duration': 11.348}, {'end': 845.301, 'text': 'You know some kinds of knowledge.', 'start': 843.98, 'duration': 1.321}, {'end': 849.224, 'text': 'you can sort of work out for yourself by doing physical stuff, right?', 'start': 845.301, 'duration': 3.923}, {'end': 851.946, 'text': "I can hold this and drop that, and I've learned something.", 'start': 849.264, 'duration': 2.682}, {'end': 853.587, 'text': "so I've learned a bit of knowledge there.", 'start': 851.946, 'duration': 1.641}, {'end': 861.533, 'text': "But sort of most of the knowledge in your heads and why you're sitting in this classroom has come from people communicating in human language to you.", 'start': 853.887, 'duration': 7.646}, {'end': 872.353, 'text': 'Um. so one of the famous- most famous deep learning people, Yann LeCun, he likes to say this line about oh, you know, really, I think that you know,', 'start': 862.353, 'duration': 10}, {'end': 877.675, 'text': "there's not much difference between um, the intelligence of a human being, and orangutan.", 'start': 872.353, 'duration': 5.322}], 'summary': 'Human language is the primary means of conveying knowledge, with yann lecun suggesting minimal difference between human and orangutan intelligence.', 'duration': 45.083, 'max_score': 832.592, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo832592.jpg'}, {'end': 945.126, 'src': 'embed', 'start': 916.244, 'weight': 7, 'content': [{'end': 925.813, 'text': "And I'd like to suggest to you, the reason for that is what human beings have achieved is we don't just have sort of one computer like a you know,", 'start': 916.244, 'duration': 9.569}, {'end': 929.636, 'text': "a dusty old IBM PC in your mother's garage.", 'start': 925.813, 'duration': 3.823}, {'end': 933.4, 'text': 'What we have is a human computer network.', 'start': 929.937, 'duration': 3.463}, {'end': 940.724, 'text': "And the way that we've achieved our human computer network is that we use human languages as our networking language.", 'start': 933.84, 'duration': 6.884}, {'end': 945.126, 'text': 'Um. and so when you think about it, um.', 'start': 941.504, 'duration': 3.622}], 'summary': 'Human achievement is a result of a vast human computer network using language as a networking tool.', 'duration': 28.882, 'max_score': 916.244, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo916244.jpg'}, {'end': 1112.449, 'src': 'embed', 'start': 1084.537, 'weight': 8, 'content': [{'end': 1089.279, 'text': "It's incredibly, incredibly recent on this scale of um evolution.", 'start': 1084.537, 'duration': 4.742}, {'end': 1095.562, 'text': 'But you know, essentially, writing was so powerful as a way of having knowledge that then,', 'start': 1089.339, 'duration': 6.223}, {'end': 1104.085, 'text': 'in those 5,000 years that ab- enabled human beings to go from um Stone Age sharp piece of flint to you know,', 'start': 1095.562, 'duration': 8.523}, {'end': 1108.507, 'text': 'having iPhones and all of these things and these incredibly sophisticated devices.', 'start': 1104.085, 'duration': 4.422}, {'end': 1112.449, 'text': "So language is pretty special thing I'd like to suggest.", 'start': 1108.947, 'duration': 3.502}], 'summary': 'Writing enabled human progress from stone age to modern technology in 5,000 years.', 'duration': 27.912, 'max_score': 1084.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo1084537.jpg'}, {'end': 1282.123, 'src': 'embed', 'start': 1247.244, 'weight': 9, 'content': [{'end': 1251.146, 'text': "So from that more ethereal level, I'll now move back to the concrete stuff.", 'start': 1247.244, 'duration': 3.902}, {'end': 1260.33, 'text': 'What we wanna do in this class is not solve the whole of language, but we want to represent the meaning of words right?', 'start': 1251.586, 'duration': 8.744}, {'end': 1263.151, 'text': 'So a lot of language is bound up in words and their meanings.', 'start': 1260.47, 'duration': 2.681}, {'end': 1266.153, 'text': 'Words can have really rich meanings, right?', 'start': 1263.871, 'duration': 2.282}, {'end': 1269.895, 'text': "As soon as you say a word, teacher, that's kind of got a lot of rich meaning.", 'start': 1266.193, 'duration': 3.702}, {'end': 1272.837, 'text': 'or you can have actions that have rich meaning.', 'start': 1269.895, 'duration': 2.942}, {'end': 1282.123, 'text': 'So, if I say a word like prognosticate or, um, dawdle or something, you know, these are words that have rich meanings and a lot of nuance on them.', 'start': 1272.857, 'duration': 9.266}], 'summary': 'Class aims to represent rich meanings of words with nuance.', 'duration': 34.879, 'max_score': 1247.244, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo1247244.jpg'}, {'end': 1408.149, 'src': 'embed', 'start': 1381.216, 'weight': 10, 'content': [{'end': 1385.997, 'text': 'In particular, favorite online thing was this online thesaurus called WordNet,', 'start': 1381.216, 'duration': 4.781}, {'end': 1391.359, 'text': 'which sort of tells you about word meanings and relationships between word meanings.', 'start': 1385.997, 'duration': 5.362}, {'end': 1399.503, 'text': "Um, so this is just giving you the very slightest sense of, um, of what's in WordNet.", 'start': 1391.959, 'duration': 7.544}, {'end': 1408.149, 'text': 'Um, so this is an actual bit of Python code up there which you can, um, type into your computer and run and do this for yourself.', 'start': 1399.943, 'duration': 8.206}], 'summary': 'Wordnet, an online thesaurus, provides word meanings and relationships through python code.', 'duration': 26.933, 'max_score': 1381.216, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo1381216.jpg'}, {'end': 1631.503, 'src': 'embed', 'start': 1606.019, 'weight': 11, 'content': [{'end': 1613.524, 'text': "And it turns out that, you know, WordNet doesn't actually do that that well because it just has these sort of fixed, discrete synonym sets.", 'start': 1606.019, 'duration': 7.505}, {'end': 1618.949, 'text': "So if you have a words in a synonym set that they're sort of a synonym, but maybe not exactly the same meaning.", 'start': 1613.564, 'duration': 5.385}, {'end': 1624.455, 'text': "If they're not in the same synonym set, you kind of can't really measure the partial resemblances of meaning for them.", 'start': 1619.05, 'duration': 5.405}, {'end': 1631.503, 'text': "So something like good and marvelous aren't in the same synonym set, but there's something that they share in common that you'd like to represent.", 'start': 1624.776, 'duration': 6.727}], 'summary': 'Wordnet has limitations in measuring partial resemblances of meaning due to fixed, discrete synonym sets.', 'duration': 25.484, 'max_score': 1606.019, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo1606019.jpg'}, {'end': 1771.564, 'src': 'embed', 'start': 1745.749, 'weight': 12, 'content': [{'end': 1751.032, 'text': 'So that you have put a one at the position and in neural net land we call these one-hot vectors.', 'start': 1745.749, 'duration': 5.283}, {'end': 1755.774, 'text': 'And so these might be our one-hot vectors for hotel and motel.', 'start': 1751.112, 'duration': 4.662}, {'end': 1758.676, 'text': 'So, there are a couple of things that are bad here.', 'start': 1756.454, 'duration': 2.222}, {'end': 1767.722, 'text': "Um, the one that's sort of a practical nuisance is you know languages have a lot of words, right?", 'start': 1758.696, 'duration': 9.026}, {'end': 1771.564, 'text': 'So sort of one of those dictionaries that you might have still had in school.', 'start': 1767.782, 'duration': 3.782}], 'summary': 'In neural networks, one-hot vectors represent words like hotel and motel, but dealing with many words is a practical nuisance.', 'duration': 25.815, 'max_score': 1745.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo1745749.jpg'}, {'end': 1974.529, 'src': 'embed', 'start': 1947.173, 'weight': 13, 'content': [{'end': 1950.335, 'text': "Okay And so that's gonna lead into these different ideas.", 'start': 1947.173, 'duration': 3.162}, {'end': 1953.876, 'text': 'So I mentioned before denotational semantics.', 'start': 1950.795, 'duration': 3.081}, {'end': 1961.74, 'text': "Here's another idea for representing the meaning of words, um, which is called distributional semantics.", 'start': 1954.297, 'duration': 7.443}, {'end': 1965.142, 'text': 'And so the idea of distributional semantics is well.', 'start': 1962.121, 'duration': 3.021}, {'end': 1968.664, 'text': 'how are we going to represent the meaning of a word?', 'start': 1965.142, 'duration': 3.522}, {'end': 1972.647, 'text': 'is by looking at the contexts, um, in which it appears.', 'start': 1968.664, 'duration': 3.983}, {'end': 1974.529, 'text': 'So this is a picture of J.R.', 'start': 1973.028, 'duration': 1.501}], 'summary': 'Discussion of denotational and distributional semantics for word meaning representation.', 'duration': 27.356, 'max_score': 1947.173, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo1947173.jpg'}], 'start': 509.9, 'title': 'Language evolution and representation', 'summary': 'Covers the significance of gpus in deep learning, final project options with team size, complexities of human language, evolution of language and writing, impact of language on technology, limitations of wordnet, challenges of one-hot vectors, and distributional semantics in nlp.', 'chapters': [{'end': 595.617, 'start': 509.9, 'title': 'Deep learning course overview', 'summary': 'Covers the significance of using gpus, the two options for the final project - default question answering system over squad dataset and custom project proposal, and the team size allowed for the final project.', 'duration': 85.717, 'highlights': ['The significant role of GPUs in powering modern deep learning and the importance of parallelism scalability. Most of modern deep learning is powered by the parallelism scalability of GPUs.', 'Two options for the final project - default question answering system over squad dataset and the option to propose a custom final project, along with the team size allowed for the final project. The default final project involves building a question answering system over the squad dataset, while the custom final project can be proposed with approval, and both options allow teams of one, two, or three.', 'Expectation for homework to be completed individually, with the allowance for general discussions about the problems. Homeworks are expected to be completed individually, with the possibility for general discussions about the problems.']}, {'end': 915.764, 'start': 595.797, 'title': 'Human language and intelligence', 'summary': 'Discusses the complexities of human language, emphasizing its adaptability and the challenges of interpreting words, highlighting the importance of human language in conveying knowledge and the distinct intelligence of human beings compared to orangutans.', 'duration': 319.967, 'highlights': ['Human language is an adaptable and complex system used for various purposes, including describing mathematics and social interactions.', 'Interpreting words and their effects on people is a challenging and uncertain process, emphasizing the complexity and unpredictability of language.', 'The conveyance of knowledge in the human world primarily occurs through human language, highlighting its significance in shaping human intelligence.', "Yann LeCun's comparison of human intelligence to orangutans is challenged, emphasizing the unique intelligence of human beings compared to orangutans."]}, {'end': 1495.564, 'start': 916.244, 'title': 'Evolution of human language and writing', 'summary': 'Discusses the evolution of human language and writing, highlighting how language has enabled human beings to construct a powerful networked computer and how writing has facilitated the transmission of knowledge spatially and temporally, ultimately leading to the development of sophisticated devices like iphones. it also delves into the concept of meaning in words and the use of resources like wordnet for understanding word meanings computationally.', 'duration': 579.32, 'highlights': ['Language has enabled human beings to construct a powerful networked computer, making them invincible by facilitating effective communication and teamwork. Human language has allowed the construction of a powerful networked computer that is more powerful than individual intelligent creatures, enabling effective communication and teamwork.', 'Writing has facilitated the transmission of knowledge spatially and temporally, leading to the development of sophisticated devices like iPhones. The invention of writing allowed spatial and temporal transmission of knowledge, leading to the development of sophisticated devices like iPhones over a span of 5,000 years.', 'The concept of meaning in words is explored, focusing on the representation of rich meanings and nuances, and the use of denotational semantics in understanding word meanings. The chapter delves into the representation of rich meanings and nuances in words, as well as the use of denotational semantics to understand word meanings.', 'The use of resources like WordNet for understanding word meanings computationally is discussed, highlighting its capabilities in providing fine-grained distinctions between different senses of a word. The chapter explores the use of resources like WordNet for understanding word meanings computationally, emphasizing its ability to provide fine-grained distinctions between different senses of a word.']}, {'end': 2058.675, 'start': 1495.584, 'title': 'Word meaning representation', 'summary': 'Discusses the limitations of wordnet in representing word meaning, the challenges of using one-hot vectors for word representation, and the concept of distributional semantics as a more successful approach in modern statistical nlp.', 'duration': 563.091, 'highlights': ['The limitations of WordNet in representing word meaning, as it fails to capture nuances and lacks completeness, leading to the need for a different and better word meaning representation.', 'The challenges of using one-hot vectors for word representation, with problems arising from the vast number of words in a language and the inability to understand relationships and meanings between words.', 'The concept of distributional semantics, which involves representing the meaning of a word by analyzing the contexts in which it appears, proving to be a successful idea in modern statistical NLP.']}], 'duration': 1548.775, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo509900.jpg', 'highlights': ['Most of modern deep learning is powered by the parallelism scalability of GPUs.', 'Two options for the final project - default question answering system over squad dataset and the option to propose a custom final project, along with the team size allowed for the final project.', 'Expectation for homework to be completed individually, with the allowance for general discussions about the problems.', 'Human language is an adaptable and complex system used for various purposes, including describing mathematics and social interactions.', 'Interpreting words and their effects on people is a challenging and uncertain process, emphasizing the complexity and unpredictability of language.', 'The conveyance of knowledge in the human world primarily occurs through human language, highlighting its significance in shaping human intelligence.', "Yann LeCun's comparison of human intelligence to orangutans is challenged, emphasizing the unique intelligence of human beings compared to orangutans.", 'Language has enabled human beings to construct a powerful networked computer, making them invincible by facilitating effective communication and teamwork.', 'Writing has facilitated the transmission of knowledge spatially and temporally, leading to the development of sophisticated devices like iPhones.', 'The concept of meaning in words is explored, focusing on the representation of rich meanings and nuances, and the use of denotational semantics in understanding word meanings.', 'The use of resources like WordNet for understanding word meanings computationally is discussed, highlighting its capabilities in providing fine-grained distinctions between different senses of a word.', 'The limitations of WordNet in representing word meaning, as it fails to capture nuances and lacks completeness, leading to the need for a different and better word meaning representation.', 'The challenges of using one-hot vectors for word representation, with problems arising from the vast number of words in a language and the inability to understand relationships and meanings between words.', 'The concept of distributional semantics, which involves representing the meaning of a word by analyzing the contexts in which it appears, proving to be a successful idea in modern statistical NLP.']}, {'end': 2522.028, 'segs': [{'end': 2113.237, 'src': 'embed', 'start': 2088.371, 'weight': 1, 'content': [{'end': 2095.592, 'text': "And so for the distributed representation, we're still gonna represent the meaning of a word as a numeric vector.", 'start': 2088.371, 'duration': 7.221}, {'end': 2101.634, 'text': "But now we're gonna say that the meaning of each word is a smallish vector.", 'start': 2095.893, 'duration': 5.741}, {'end': 2107.515, 'text': "um, but it's going to be a dense vector, whereby all of the numbers are non-zero.", 'start': 2101.634, 'duration': 5.881}, {'end': 2113.237, 'text': 'So the meaning of banking is gonna be distributed over the dimensions of this vector.', 'start': 2107.916, 'duration': 5.321}], 'summary': 'Word meanings represented as small dense vectors for distribution over dimensions.', 'duration': 24.866, 'max_score': 2088.371, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2088371.jpg'}, {'end': 2273.652, 'src': 'embed', 'start': 2251.374, 'weight': 2, 'content': [{'end': 2262.323, 'text': "um in the 2D projection and indeed some of what push things together in the 2D um projection may really really really misrepresent what's in the original space.", 'start': 2251.374, 'duration': 10.949}, {'end': 2268.908, 'text': "Um, but even looking at these 2D representations, the overall feeling is by gosh, this actually sort of works, doesn't it??", 'start': 2262.703, 'duration': 6.205}, {'end': 2273.652, 'text': 'Um, we can sort of see similarities, um between words.', 'start': 2269.268, 'duration': 4.384}], 'summary': '2d projection may misrepresent original space, but shows similarities between words.', 'duration': 22.278, 'max_score': 2251.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2251374.jpg'}, {'end': 2491.667, 'src': 'embed', 'start': 2420.036, 'weight': 0, 'content': [{'end': 2427.346, 'text': 'in sort of turning the world of NLP in a neural networks direction was that picture.', 'start': 2420.036, 'duration': 7.31}, {'end': 2437.152, 'text': 'Um, was this algorithm that Tamás Mikulov came up with in 2013 called the Word2Vec algorithm.', 'start': 2427.786, 'duration': 9.366}, {'end': 2443.155, 'text': "So it wasn't the first work in having distributed representations of words.", 'start': 2437.492, 'duration': 5.663}, {'end': 2453.683, 'text': "So there was older work from Yoshua Bengio that went back to about the sort of turn in the millennium that somehow it sort of hadn't really sort of hit the world over the head and had a huge impact.", 'start': 2443.235, 'duration': 10.448}, {'end': 2460.109, 'text': 'And it was really sort of Tomasz Mikulov showed this very simple, very scalable way of learning.', 'start': 2453.963, 'duration': 6.146}, {'end': 2464.773, 'text': 'vector representations of um words and that sort of really opened the floodgates.', 'start': 2460.109, 'duration': 4.664}, {'end': 2468.216, 'text': "And so that's the algorithm that I'm gonna, um, show now.", 'start': 2465.093, 'duration': 3.123}, {'end': 2475.36, 'text': 'Okay So the idea of this algorithm is you start with a big pile of text.', 'start': 2468.897, 'duration': 6.463}, {'end': 2482.463, 'text': 'Um, so you- whatever you find you know web pages or newspaper articles or something a lot of continuous text right?', 'start': 2475.96, 'duration': 6.503}, {'end': 2486.125, 'text': 'Actual sentences, because we want to learn word meaning context.', 'start': 2482.503, 'duration': 3.622}, {'end': 2491.667, 'text': 'Um, NLP people, um, call a large pile of text a corpus.', 'start': 2486.485, 'duration': 5.182}], 'summary': 'Word2vec algorithm revolutionized nlp with simple, scalable word vector representations.', 'duration': 71.631, 'max_score': 2420.036, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2420036.jpg'}], 'start': 2060.295, 'title': 'Distributed word representations and word2vec algorithm in nlp', 'summary': "Covers the concept of distributed word representations, where words are represented as numeric vectors, typically ranging from 50 to 300 dimensions, reflecting similarities and groupings. it also discusses the word2vec algorithm's impact on nlp, and the process of learning vector representations of words from a large corpus of text, leading to a significant shift in the field towards neural networks.", 'chapters': [{'end': 2273.652, 'start': 2060.295, 'title': 'Distributed word representations', 'summary': 'Discusses the concept of distributed word representations, where the meaning of each word is represented as a dense numeric vector, with typical dimensions ranging from 50 to 300, providing a visual representation that reflects similarities and groupings between words.', 'duration': 213.357, 'highlights': ['The meaning of each word is represented as a dense numeric vector, with typical dimensions ranging from 50 to 300. The distributed representation of words involves using a numeric vector to represent the meaning of each word, with typical dimensions ranging from 50 to 300, providing a denser, non-zero vector representation.', 'Visual representation reflects similarities and groupings between words. The visual representation of the vector space shows similarities and groupings between words, such as countries, verbs, and morphological forms, providing an intuitive insight into word relationships.', 'Projection of 100-dimensional word vectors into two dimensions for visual representation. The 100-dimensional word vectors are projected down into two dimensions for visualization, although it may lose some detail and misrepresent the original space, it still provides a meaningful and functional visualization of word similarities.']}, {'end': 2522.028, 'start': 2274.873, 'title': 'Word2vec algorithm in nlp', 'summary': 'Discusses the word2vec algorithm, its impact on nlp, and the process of learning vector representations of words from a large corpus of text, which led to a significant shift in the field of nlp towards neural networks.', 'duration': 247.155, 'highlights': ['The Word2Vec algorithm, introduced by Tamás Mikulov in 2013, had a significant impact on the field of NLP and led to a shift towards neural networks. The Word2Vec algorithm, introduced by Tamás Mikulov in 2013, was a pivotal moment in the shift of NLP towards neural networks, enabling the learning of vector representations of words from a large corpus of text.', 'The algorithm involves learning word meaning context from a large corpus of text, such as web pages or newspaper articles. The algorithm involves processing a large pile of text, referred to as a corpus in NLP, to learn word meaning context, typically from sources like web pages or newspaper articles.', 'The Word2Vec algorithm provides a simple and scalable way of learning vector representations of words. Tomasz Mikulov presented a simple and scalable way of learning vector representations of words, which significantly impacted the field of NLP and led to the widespread adoption of neural networks.']}], 'duration': 461.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2060295.jpg', 'highlights': ['The Word2Vec algorithm, introduced by Tamás Mikulov in 2013, had a significant impact on the field of NLP and led to a shift towards neural networks.', 'The distributed representation of words involves using a numeric vector to represent the meaning of each word, with typical dimensions ranging from 50 to 300, providing a denser, non-zero vector representation.', 'The visual representation of the vector space shows similarities and groupings between words, such as countries, verbs, and morphological forms, providing an intuitive insight into word relationships.', 'The algorithm involves learning word meaning context from a large corpus of text, such as web pages or newspaper articles.', 'The 100-dimensional word vectors are projected down into two dimensions for visualization, although it may lose some detail and misrepresent the original space, it still provides a meaningful and functional visualization of word similarities.', 'Tomasz Mikulov presented a simple and scalable way of learning vector representations of words, which significantly impacted the field of NLP and led to the widespread adoption of neural networks.']}, {'end': 3294, 'segs': [{'end': 2757.83, 'src': 'embed', 'start': 2726.684, 'weight': 2, 'content': [{'end': 2731.988, 'text': "and we're then gonna be able to use that to predict what other words occur in a way I'm about to show you.", 'start': 2726.684, 'duration': 5.304}, {'end': 2741.335, 'text': "Okay, So um, that's our likelihood, and so what we do in all of these models is we sort of define an objective function,", 'start': 2732.629, 'duration': 8.706}, {'end': 2750.302, 'text': "and then we're going to be able to want to come up with vector representations of words in such a way as to minimize our objective function.", 'start': 2741.335, 'duration': 8.967}, {'end': 2757.83, 'text': "Um, So objective function is basically the same as what's on the top half of the slide, but we change a couple of things.", 'start': 2750.322, 'duration': 7.508}], 'summary': 'Using likelihood and objective function to predict word occurrences and minimize objective function.', 'duration': 31.146, 'max_score': 2726.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2726684.jpg'}, {'end': 2997.591, 'src': 'heatmap', 'start': 2796.365, 'weight': 0.71, 'content': [{'end': 2806.168, 'text': "So when we do that, we've then got a log of all these products which will allow us to turn things into a sums of the log of this probability.", 'start': 2796.365, 'duration': 9.803}, {'end': 2810.23, 'text': "And we'll go through that again in just a minute.", 'start': 2806.369, 'duration': 3.861}, {'end': 2820.94, 'text': 'Okay, Um, and so if we can min- if we can change our vector representations of these words so as to minimize this J of Theta,', 'start': 2811.286, 'duration': 9.654}, {'end': 2824.946, 'text': "that means we'll be good at predicting words in the context of another word.", 'start': 2820.94, 'duration': 4.006}, {'end': 2829.76, 'text': 'Um, So then that all sounded good,', 'start': 2824.966, 'duration': 4.794}, {'end': 2839.113, 'text': 'but it was all dependent on having this probability function where you want to predict the probability of a word in the context given the center word.', 'start': 2829.76, 'duration': 9.353}, {'end': 2843.398, 'text': 'And the question is how can you possibly do that?', 'start': 2839.553, 'duration': 3.845}, {'end': 2850.844, 'text': 'Um well, Um, remember, what I said is actually our model is just gonna have vector representations of words,', 'start': 2843.418, 'duration': 7.426}, {'end': 2853.346, 'text': 'and that was the only parameters of the model.', 'start': 2850.844, 'duration': 2.502}, {'end': 2855.607, 'text': "Now, that's- that's almost true.", 'start': 2853.846, 'duration': 1.761}, {'end': 2856.848, 'text': "It's not quite true.", 'start': 2855.647, 'duration': 1.201}, {'end': 2865.914, 'text': 'Um, we actually cheat slightly since we actually proposed two vector representations for each word and this makes it simpler, um, to do this.', 'start': 2856.868, 'duration': 9.046}, {'end': 2868.015, 'text': 'Um, you cannot do this.', 'start': 2866.514, 'duration': 1.501}, {'end': 2870.476, 'text': 'There are ways to get around it, but this is the simplest way to do it.', 'start': 2868.035, 'duration': 2.441}, {'end': 2873.897, 'text': "So we have one vector for a word when it's the center word.", 'start': 2870.696, 'duration': 3.201}, {'end': 2879.419, 'text': "that's predicting other words, but we have a second vector for each word when it's a context word.", 'start': 2873.897, 'duration': 5.522}, {'end': 2881.04, 'text': "So it's one of the words in context.", 'start': 2879.439, 'duration': 1.601}, {'end': 2886.502, 'text': 'So for each word type, we have these two vectors as center word, as context word.', 'start': 2881.36, 'duration': 5.142}, {'end': 2897.348, 'text': "Um, so then we're gonna work out this probability of a word in the context, given a center word purely in terms of these vectors.", 'start': 2887.142, 'duration': 10.206}, {'end': 2904.794, 'text': "And the way we do it is with this equation, um, right here, which I'll explain more of in just a moment.", 'start': 2897.828, 'duration': 6.966}, {'end': 2909.558, 'text': "So we're still on exactly the same situation, right?", 'start': 2905.414, 'duration': 4.144}, {'end': 2915.442, 'text': "That we're wanting to work out probabilities of words occurring in the context of our center word.", 'start': 2909.598, 'duration': 5.844}, {'end': 2921.588, 'text': 'So, the center word is C and the context words represented with O and these, some of this slide notation.', 'start': 2915.763, 'duration': 5.825}, {'end': 2926.014, 'text': "But so we're basically saying there's one kind of vector for center words,", 'start': 2921.889, 'duration': 4.125}, {'end': 2933.485, 'text': "there's a different kind of vector for context words and we're gonna work out this probabilistic prediction.", 'start': 2926.014, 'duration': 7.471}, {'end': 2935.508, 'text': 'um, in terms of these word vectors.', 'start': 2933.485, 'duration': 2.023}, {'end': 2939.181, 'text': 'Okay, So how can we do that??', 'start': 2937.239, 'duration': 1.942}, {'end': 2943.947, 'text': 'Well, the way we do it is with this um formula here,', 'start': 2939.221, 'duration': 4.726}, {'end': 2950.093, 'text': 'which is the sort of shape that you see over and over again um in deep learning with categorical stuff.', 'start': 2943.947, 'duration': 6.146}, {'end': 2957.562, 'text': 'So for the very center bit of it, the bit in orange um, or the same thing occurs in the um denominator.', 'start': 2950.474, 'duration': 7.088}, {'end': 2961.066, 'text': "what we're doing there is calculating a dot product.", 'start': 2957.962, 'duration': 3.104}, {'end': 2965.851, 'text': "So we're gonna go through the components of our vector and we're gonna multiply them together.", 'start': 2961.166, 'duration': 4.685}, {'end': 2977.483, 'text': 'And that means if, um, different words have big components of the same sign plus or minus in the same positions, the dot product will be big.', 'start': 2965.871, 'duration': 11.612}, {'end': 2984.226, 'text': "And if they have different signs or one's big and one's small, the dot product will be a lot smaller.", 'start': 2978.184, 'duration': 6.042}, {'end': 2991.489, 'text': 'So that orange part directly calculates uh, sort of a similarity between words.', 'start': 2984.546, 'duration': 6.943}, {'end': 2995.01, 'text': "where the similarity is, they're sort of vectors looking the same right?", 'start': 2991.489, 'duration': 3.521}, {'end': 2997.591, 'text': "Um, and so that's the heart of it, right?", 'start': 2995.491, 'duration': 2.1}], 'summary': 'Using vector representations, the probability of a word occurring in context is calculated through a dot product to determine similarity between words.', 'duration': 201.226, 'max_score': 2796.365, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2796365.jpg'}, {'end': 2879.419, 'src': 'embed', 'start': 2856.868, 'weight': 3, 'content': [{'end': 2865.914, 'text': 'Um, we actually cheat slightly since we actually proposed two vector representations for each word and this makes it simpler, um, to do this.', 'start': 2856.868, 'duration': 9.046}, {'end': 2868.015, 'text': 'Um, you cannot do this.', 'start': 2866.514, 'duration': 1.501}, {'end': 2870.476, 'text': 'There are ways to get around it, but this is the simplest way to do it.', 'start': 2868.035, 'duration': 2.441}, {'end': 2873.897, 'text': "So we have one vector for a word when it's the center word.", 'start': 2870.696, 'duration': 3.201}, {'end': 2879.419, 'text': "that's predicting other words, but we have a second vector for each word when it's a context word.", 'start': 2873.897, 'duration': 5.522}], 'summary': 'Proposed two vector representations for each word to simplify prediction.', 'duration': 22.551, 'max_score': 2856.868, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2856868.jpg'}, {'end': 3063.307, 'src': 'embed', 'start': 3040.058, 'weight': 4, 'content': [{'end': 3047.189, 'text': 'We sum up what this quantity is, for every different word in our vocabulary, um, and we divide through by it.', 'start': 3040.058, 'duration': 7.131}, {'end': 3051.878, 'text': 'And so that normalizes things and turns them into a probability distribution.', 'start': 3047.449, 'duration': 4.429}, {'end': 3055.801, 'text': "Yeah So there's sort of in practice, there are two parts.", 'start': 3052.918, 'duration': 2.883}, {'end': 3063.307, 'text': "There's the orange part which is this idea of using dot product and vector space as our similarity measure between words.", 'start': 3056.061, 'duration': 7.246}], 'summary': 'Normalizing data for probability distribution, using dot product for word similarity.', 'duration': 23.249, 'max_score': 3040.058, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3040058.jpg'}, {'end': 3194.718, 'src': 'embed', 'start': 3166.008, 'weight': 0, 'content': [{'end': 3170.91, 'text': 'And so what we want to be able to do is, then um,', 'start': 3166.008, 'duration': 4.902}, {'end': 3180.573, 'text': 'move our vector representations of words around so that they are good at predicting what words occur in the context of other words.', 'start': 3170.91, 'duration': 9.663}, {'end': 3186.155, 'text': "Um, And so, at this point, what we're gonna do is optimization.", 'start': 3181.253, 'duration': 4.902}, {'end': 3190.176, 'text': 'So, we have vector components of different words.', 'start': 3186.535, 'duration': 3.641}, {'end': 3194.718, 'text': "We have a very high dimensional space again, but here I've just got two for the picture.", 'start': 3190.476, 'duration': 4.242}], 'summary': 'Optimizing vector representations of words for predicting word context.', 'duration': 28.71, 'max_score': 3166.008, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3166008.jpg'}], 'start': 2522.128, 'title': 'Word vector representations', 'summary': 'Discusses representing words as vectors, predicting word context, utilizing probability models, and optimizing word representations for better predictions.', 'chapters': [{'end': 2810.23, 'start': 2522.128, 'title': 'Word vector representation in context prediction', 'summary': 'Discusses the process of representing words as vectors and iteratively predicting the context of words to achieve a good word vector space, using a probability model to predict words around a center word and an objective function to minimize and optimize the vector representations of words.', 'duration': 288.102, 'highlights': ['The process involves representing words as vectors and iteratively predicting the context of words to achieve a good word vector space. The speaker explains the process of representing words as vectors and iteratively predicting the context of words to achieve a good word vector space.', 'Using a probability model to predict words around a center word. The chapter discusses the use of a probability model to predict the words around a center word, aiming to optimize the vector representations of words.', 'Defining an objective function to minimize and optimize the vector representations of words. The chapter explains the process of defining an objective function to minimize and optimize the vector representations of words.']}, {'end': 3294, 'start': 2811.286, 'title': 'Word vector representations and probability prediction', 'summary': 'Discusses the use of vector representations of words to predict probabilities of words occurring in the context of other words, utilizing dot product similarity, exponential transformation, and softmax distribution, and the optimization process to adjust word representations for better predictions.', 'duration': 482.714, 'highlights': ['The model uses two vector representations for each word, one for the center word and one for the context word, simplifying the prediction process. Having two vector representations for each word simplifies the prediction process, making it easier to calculate the probability of a word in the context given the center word.', 'The probability prediction is calculated using dot product to measure the similarity between word vectors, followed by an exponential transformation to obtain positive numbers for forming a probability distribution using softmax function. The prediction involves calculating the dot product for similarity measurement between word vectors, followed by an exponential transformation to ensure positive numbers for forming a probability distribution using softmax function.', 'The optimization process involves adjusting the vector representations of words to minimize the loss function, aiming to better predict words occurring in the context of other words. The optimization process entails adjusting the vector representations of words to minimize the loss function, aiming to enhance the prediction of words occurring in the context of other words.']}], 'duration': 771.872, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo2522128.jpg', 'highlights': ['The process involves representing words as vectors and iteratively predicting the context of words to achieve a good word vector space.', 'Using a probability model to predict words around a center word, aiming to optimize the vector representations of words.', 'Defining an objective function to minimize and optimize the vector representations of words.', 'The model uses two vector representations for each word, one for the center word and one for the context word, simplifying the prediction process.', 'The probability prediction is calculated using dot product to measure the similarity between word vectors, followed by an exponential transformation to obtain positive numbers for forming a probability distribution using softmax function.', 'The optimization process involves adjusting the vector representations of words to minimize the loss function, aiming to better predict words occurring in the context of other words.']}, {'end': 3644.689, 'segs': [{'end': 3366.286, 'src': 'embed', 'start': 3320.715, 'weight': 0, 'content': [{'end': 3332.76, 'text': "u and v vectors is we're literally gonna start with a random vector for each word and then we're iteratively going to change those vectors a little bit as we learn.", 'start': 3320.715, 'duration': 12.045}, {'end': 3336.182, 'text': "And the way we're gonna work out how to change them is.", 'start': 3333.18, 'duration': 3.002}, {'end': 3342.406, 'text': "we're gonna say I want to do optimization and that is going to be implemented as okay.", 'start': 3336.182, 'duration': 6.224}, {'end': 3344.628, 'text': 'we have the current vectors for each word.', 'start': 3342.406, 'duration': 2.222}, {'end': 3359.92, 'text': 'Let me do some calculus to work out how I could change the word vectors um to mean that the word vectors would calculate a higher probability for the words that actually occur in contexts of this center word.', 'start': 3345.028, 'duration': 14.892}, {'end': 3366.286, 'text': "And we will do that and we'll do it again and again and again, and then we'll eventually end up with good word vectors.", 'start': 3360.361, 'duration': 5.925}], 'summary': 'Iteratively change word vectors to optimize for higher probability in contexts, resulting in good word vectors.', 'duration': 45.571, 'max_score': 3320.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3320715.jpg'}, {'end': 3490.452, 'src': 'embed', 'start': 3448.038, 'weight': 1, 'content': [{'end': 3458.324, 'text': "Okay, So we had that and then we'd had this formula that the probability of the outside word, given the context word,", 'start': 3448.038, 'duration': 10.286}, {'end': 3460.985, 'text': 'is this formula we just went through of exp?', 'start': 3458.324, 'duration': 2.661}, {'end': 3473.342, 'text': 'uot vc over the sum w equals 1 to the vocabulary size of exp?', 'start': 3460.985, 'duration': 12.357}, {'end': 3477.265, 'text': 'u w t v c.', 'start': 3473.342, 'duration': 3.923}, {'end': 3479.266, 'text': "Okay So that's sort of our model.", 'start': 3477.265, 'duration': 2.001}, {'end': 3482.928, 'text': 'We want to min- minimize this.', 'start': 3479.646, 'duration': 3.282}, {'end': 3490.452, 'text': 'So we want to minimize this and we want to minimize that by changing these parameters.', 'start': 3484.468, 'duration': 5.984}], 'summary': 'Model aims to minimize the probability formula by changing parameters.', 'duration': 42.414, 'max_score': 3448.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3448038.jpg'}, {'end': 3551.976, 'src': 'embed', 'start': 3513.507, 'weight': 3, 'content': [{'end': 3518.917, 'text': 'because if we can work out downhill, is we got just gonna walk downhill and our model get better?', 'start': 3513.507, 'duration': 5.41}, {'end': 3525.028, 'text': "So, we're gonna take derivatives and work out what direction downhill is and then we wanna walk that way.", 'start': 3519.237, 'duration': 5.791}, {'end': 3541.953, 'text': "Yeah So, well, so I'm wanting to achieve this.", 'start': 3525.329, 'duration': 16.624}, {'end': 3551.976, 'text': 'Um. what I want to achieve for my distributional notion of meaning is I have a meaningful word, a vector,', 'start': 3542.733, 'duration': 9.243}], 'summary': 'Using derivatives to optimize model for distributional notion of meaning.', 'duration': 38.469, 'max_score': 3513.507, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3513507.jpg'}], 'start': 3294.02, 'title': 'Word vector calculation and optimization', 'summary': 'Explains the calculation of word vectors u and v, their optimization through calculus, and attainment of good word vectors. it also discusses the optimization of a model for distributional notion of meaning, aiming to accurately estimate high and low probability words.', 'chapters': [{'end': 3383.745, 'start': 3294.02, 'title': 'Word vector calculation and optimization', 'summary': 'Explains the process of calculating word vectors u and v, starting with random vectors and iteratively changing them to optimize higher probability for words occurring in contexts, thorough implementation of optimization through calculus, and eventual attainment of good word vectors.', 'duration': 89.725, 'highlights': ['The process of calculating word vectors u and v involves starting with random vectors for each word and iteratively changing them to optimize higher probability for words occurring in contexts.', 'Implementation of optimization is achieved through calculus to change the word vectors to calculate a higher probability for words occurring in contexts of the center word.', 'Thorough explanation of the high-level recipe for maximizing the formula to attain good word vectors.', 'The importance of understanding the concept and the high-level recipe for word vector calculation and optimization.']}, {'end': 3644.689, 'start': 3383.745, 'title': 'Optimizing distributional notion of meaning', 'summary': 'Discusses the optimization of a model for distributional notion of meaning, aiming to accurately estimate high and low probability words in a given context by minimizing the parameters through calculus and derivatives.', 'duration': 260.944, 'highlights': ["The model aims to accurately estimate high and low probability words in a given context by minimizing the parameters through calculus and derivatives. The goal is to achieve a meaningful word vector that accurately estimates high probability for words occurring in its context and low probability for words that don't typically occur, achieved by minimizing parameters through calculus and derivatives.", "The model's probability distribution is limited in providing precise predictions for all words in a given context, typically offering a 5% chance at best for one word. The model's limitation includes providing at best a 5% chance for one word in the context due to using a simple probability distribution to predict multiple words, thus not being able to accurately predict every word with high certainty.", 'The approach involves walking downhill in the parameter space to improve the model. The process involves determining the slope of the space, identifying the downhill direction, and optimizing the model by walking downhill in the parameter space to achieve improvement.']}], 'duration': 350.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3294020.jpg', 'highlights': ['The process of calculating word vectors u and v involves starting with random vectors for each word and iteratively changing them to optimize higher probability for words occurring in contexts.', 'The model aims to accurately estimate high and low probability words in a given context by minimizing the parameters through calculus and derivatives.', 'The importance of understanding the concept and the high-level recipe for word vector calculation and optimization.', 'The approach involves walking downhill in the parameter space to improve the model.']}, {'end': 4365.073, 'segs': [{'end': 3697.944, 'src': 'embed', 'start': 3644.689, 'weight': 1, 'content': [{'end': 3654.394, 'text': 'but nevertheless we want to capture the fact that you know withdrawal is much more likely um to occur near the word bank,', 'start': 3644.689, 'duration': 9.705}, {'end': 3657.155, 'text': 'um than something like football.', 'start': 3654.394, 'duration': 2.761}, {'end': 3659.916, 'text': "Um, that's, you know, basically what our goal is.", 'start': 3657.615, 'duration': 2.301}, {'end': 3664.582, 'text': 'Okay Um, yeah.', 'start': 3662.06, 'duration': 2.522}, {'end': 3672.548, 'text': 'So we want to maximize this by minimizing this, which means we then want to do some calculus, um, to work this out.', 'start': 3664.602, 'duration': 7.946}, {'end': 3676.751, 'text': "So what we're then gonna do is that we're going to say well,", 'start': 3672.588, 'duration': 4.163}, {'end': 3688.138, 'text': "these parameters are our word vectors and we're gonna sort of want to move these word vectors um to um, work things out as to how to um walk downhill.", 'start': 3676.751, 'duration': 11.387}, {'end': 3697.944, 'text': "So the case that I'm gonna do now is gonna look at the parameters of this center word VC and work out how to do things with respect to it.", 'start': 3688.238, 'duration': 9.706}], 'summary': "The goal is to minimize withdrawal near 'bank' and maximize it near 'football' by using calculus and word vectors.", 'duration': 53.255, 'max_score': 3644.689, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3644689.jpg'}, {'end': 4203.669, 'src': 'embed', 'start': 4170.317, 'weight': 0, 'content': [{'end': 4174.6, 'text': "Okay So by doing the chain rule twice, we've got that.", 'start': 4170.317, 'duration': 4.283}, {'end': 4186.45, 'text': 'And so now, if we put it together, you know the derivative of BC with respect of the whole thing, this log of the probability of O given C, right?', 'start': 4175.14, 'duration': 11.31}, {'end': 4194.163, 'text': "That for the numerator it was just UO, and then we're subtracting.", 'start': 4186.51, 'duration': 7.653}, {'end': 4203.669, 'text': 'we had this term here, um, which is sort of a denominator, and then we have this term here, which is a numerator.', 'start': 4194.163, 'duration': 9.506}], 'summary': 'Using the chain rule twice, the derivative of bc with respect to the whole thing is the log of the probability of o given c, with uo as the numerator and a denominator term.', 'duration': 33.352, 'max_score': 4170.317, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4170317.jpg'}], 'start': 3644.689, 'title': 'Calculus in nlp', 'summary': "Delves into the application of calculus in optimizing word vectors in nlp, focusing on withdrawal likelihood near the word 'bank' and the need to minimize parameters. it also discusses finding partial derivatives with respect to vector representation and the resulting probability distribution for predicting word probabilities.", 'chapters': [{'end': 3711.512, 'start': 3644.689, 'title': 'Word vector calculus in nlp', 'summary': "Discusses the application of calculus in optimizing word vectors in nlp, emphasizing the likelihood of withdrawal near the word 'bank' and the need to minimize parameters for optimization.", 'duration': 66.823, 'highlights': ["The likelihood of withdrawal is much more likely near the word 'bank' than something like 'football'.", 'The goal is to maximize by minimizing, which involves using calculus for optimization.', 'Moving word vectors to work things out and walk downhill is a key part of the process.', 'Working out the slope with respect to the UO vector is also important for optimization.']}, {'end': 4365.073, 'start': 3711.892, 'title': 'Partial derivative of logarithmic functions', 'summary': 'Explains the process of finding the partial derivative with respect to a vector representation, involving multivariate calculus and chain rule, resulting in the probability distribution for predicting the probability of words.', 'duration': 653.181, 'highlights': ['The process involves finding the partial derivative of the log of the numerator and the denominator, resulting in the numerator as UO and the denominator as the sum of the probability of x given c times ux.', 'The chain rule is applied to find the derivative of the inside part, leading to the discovery of the probability distribution for predicting the probability of words.', 'The method of finding the partial derivative involves multivariate calculus and the application of the chain rule, resulting in the discovery of the probability distribution for predicting the probability of words.', 'The process involves rewriting the log of the numerator and the denominator, resulting in the numerator as UO and the denominator as the sum of the probability of x given c times ux.', 'The application of the chain rule results in the discovery of the probability distribution for predicting the probability of words and the weighted average of the models of the representation of each word, multiplied by the probability of it in the current model.']}], 'duration': 720.384, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo3644689.jpg', 'highlights': ['The process involves finding the partial derivative of the log of the numerator and the denominator, resulting in the numerator as UO and the denominator as the sum of the probability of x given c times ux.', "The likelihood of withdrawal is much more likely near the word 'bank' than something like 'football'.", 'The goal is to maximize by minimizing, which involves using calculus for optimization.', 'The chain rule is applied to find the derivative of the inside part, leading to the discovery of the probability distribution for predicting the probability of words.', 'Moving word vectors to work things out and walk downhill is a key part of the process.']}, {'end': 4905.617, 'segs': [{'end': 4423.265, 'src': 'embed', 'start': 4365.434, 'weight': 0, 'content': [{'end': 4371.938, 'text': "And so we're taking the difference between the expected context word and the actual context word that showed up.", 'start': 4365.434, 'duration': 6.504}, {'end': 4379.424, 'text': 'And that difference then turns out to exactly give us the slope as to which direction we should be walking,', 'start': 4372.319, 'duration': 7.105}, {'end': 4385.623, 'text': "changing the words representation in order to improve our model's ability to predict.", 'start': 4379.684, 'duration': 5.939}, {'end': 4394.13, 'text': "Okay Um, so we'll, um, assignment two if, um, yeah.", 'start': 4387.586, 'duration': 6.544}, {'end': 4403.957, 'text': "So, um, it'd be a great exercise for you guys, um, to in, um, to try and do that for the cent- um, wait, I did the center words.", 'start': 4394.17, 'duration': 9.787}, {'end': 4410.761, 'text': 'Try and do it for the context words as well and show that you can do the same kind of piece of math and have it work out.', 'start': 4404.217, 'duration': 6.544}, {'end': 4423.265, 'text': "Um, if I've just got a few minutes left at the end, um, what I just wanted to show you if I can get all of this to work right.", 'start': 4411.302, 'duration': 11.963}], 'summary': 'Analyzing context word differences to improve word representation for prediction model.', 'duration': 57.831, 'max_score': 4365.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4365434.jpg'}, {'end': 4499.563, 'src': 'embed', 'start': 4471.903, 'weight': 2, 'content': [{'end': 4474.805, 'text': 'Is that readable? Okay.', 'start': 4471.903, 'duration': 2.902}, {'end': 4476.687, 'text': 'Um, so, right.', 'start': 4475.206, 'duration': 1.481}, {'end': 4481.811, 'text': 'So, so NumPy is the sort of, um, do math package in Python.', 'start': 4476.747, 'duration': 5.064}, {'end': 4484.273, 'text': "You'll want to know about that if you don't know about it.", 'start': 4481.991, 'duration': 2.282}, {'end': 4489.017, 'text': 'Um, Matplotlib is sort of the, one of the, the most basic graphing package.', 'start': 4484.693, 'duration': 4.324}, {'end': 4491.238, 'text': "If you don't know about that, you're gonna want to know about it.", 'start': 4489.077, 'duration': 2.161}, {'end': 4499.563, 'text': 'Um, This is sort of an IPython or Jupyter special that lets you have an interactive matplotlib, um, inside.', 'start': 4491.599, 'duration': 7.964}], 'summary': 'Numpy is a math package in python, and matplotlib is a basic graphing package, essential for data analysis.', 'duration': 27.66, 'max_score': 4471.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4471903.jpg'}, {'end': 4565.8, 'src': 'embed', 'start': 4513.37, 'weight': 3, 'content': [{'end': 4520.896, 'text': 'Gensim is kind of a word similarity package which started off um with um methods like latent Dirichlet analysis,', 'start': 4513.37, 'duration': 7.526}, {'end': 4523.239, 'text': 'if you know about that from modeling word similarities.', 'start': 4520.896, 'duration': 2.343}, {'end': 4528.525, 'text': "But it's sort of grown as a good package, um, for doing, um, word vectors as well.", 'start': 4523.259, 'duration': 5.266}, {'end': 4532.851, 'text': "So it's quite often used for word vectors and word similarities.", 'start': 4528.645, 'duration': 4.206}, {'end': 4535.854, 'text': "It's sort of efficient for doing things at large scale.", 'start': 4532.931, 'duration': 2.923}, {'end': 4537.536, 'text': 'Um, Yeah.', 'start': 4535.874, 'duration': 1.662}, {'end': 4540.779, 'text': "So, I haven't yet told you about will next time.", 'start': 4537.937, 'duration': 2.842}, {'end': 4545.904, 'text': 'We have our own s- homegrown form of word vectors which are the GloVe word vectors.', 'start': 4541.08, 'duration': 4.824}, {'end': 4555.333, 'text': "Um, I'm using them not because it really matters for what I'm showing, but, you know, these vectors are conveniently small.", 'start': 4546.405, 'duration': 8.928}, {'end': 4565.8, 'text': 'Um, it turns out that the vectors that, um, Facebook and Google distribute, uh, extremely large vocabulary and extremely high-dimensional.', 'start': 4555.353, 'duration': 10.447}], 'summary': 'Gensim is a package for word similarity and vectors, efficient for large-scale tasks.', 'duration': 52.43, 'max_score': 4513.37, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4513370.jpg'}, {'end': 4615.186, 'src': 'embed', 'start': 4588.152, 'weight': 5, 'content': [{'end': 4593.356, 'text': 'but they actually provide the utility that converts the GloVe file format to the Word2Vec file format.', 'start': 4588.152, 'duration': 5.204}, {'end': 4604.343, 'text': "So I've done that, um, and then I've loaded a pre-trained model of Word vectors, um, And so this is what they call a keyed vector.", 'start': 4593.396, 'duration': 10.947}, {'end': 4611.665, 'text': "And so a keyed vector is nothing fancy, it's just you have words like potato and there's a vector that hangs off each one.", 'start': 4604.643, 'duration': 7.022}, {'end': 4615.186, 'text': "So it's really just sort of a big dictionary with a vector for each thing.", 'start': 4611.705, 'duration': 3.481}], 'summary': 'Converted glove format to word2vec, loaded pre-trained model of word vectors.', 'duration': 27.034, 'max_score': 4588.152, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4588152.jpg'}, {'end': 4770.89, 'src': 'embed', 'start': 4742.712, 'weight': 6, 'content': [{'end': 4753.218, 'text': 'and then we could add to it the meaning of woman, and then we could say which word in our vector space is most similar in meaning to that um word,', 'start': 4742.712, 'duration': 10.506}, {'end': 4755.779, 'text': 'and that would be a way of sort of doing analogies.', 'start': 4753.218, 'duration': 2.561}, {'end': 4761.803, 'text': "We'd be able to do the, um, analogy man is to king as woman is to what.", 'start': 4755.819, 'duration': 5.984}, {'end': 4770.89, 'text': "And so the way we're gonna do that is to say we want to be similar to king and woman because they're both positive ones and far away from man.", 'start': 4762.363, 'duration': 8.527}], 'summary': "Using vector space, we can find word analogies, such as 'man is to king as woman is to what', by identifying the most similar words.", 'duration': 28.178, 'max_score': 4742.712, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4742712.jpg'}], 'start': 4365.434, 'title': 'Word representation improvement and intro to data science tools', 'summary': "Discusses improving word representation by calculating the difference between expected and actual context words, aiming to enhance predictive ability, and demonstrates various data science tools including numpy, matplotlib, scikit-learn, and gensim, focusing on gensim's word similarity and vector manipulation capabilities with practical examples.", 'chapters': [{'end': 4423.265, 'start': 4365.434, 'title': 'Word representation improvement', 'summary': "Discusses the calculation of the difference between expected and actual context words to determine the slope for changing word representation, aimed at improving the model's predictive ability, with a challenge to apply the same concept to context words.", 'duration': 57.831, 'highlights': ["The calculation of the difference between expected and actual context words determines the slope for changing word representation, aimed at improving the model's predictive ability.", 'A challenge is presented to apply the same concept to context words.']}, {'end': 4905.617, 'start': 4424.847, 'title': 'Intro to data science tools', 'summary': "Demonstrates the use of various data science tools such as numpy, matplotlib, scikit-learn, and gensim, with a focus on gensim's word similarity and vector manipulation capabilities, showcasing practical examples of word similarities and analogies.", 'duration': 480.77, 'highlights': ['The chapter introduces essential data science tools like NumPy, Matplotlib, scikit-learn, and Gensim, emphasizing their importance for mathematical operations and graphing (relevance score: 5)', 'Gensim is highlighted as a word similarity package that has evolved to efficiently handle word vectors and similarities, particularly useful for large-scale operations (relevance score: 4)', 'The instructor explains the use of GloVe word vectors, emphasizing their convenience due to their smaller size compared to vectors distributed by Facebook and Google, enabling efficient processing on a laptop (relevance score: 3)', "The demonstration of Gensim's capability to convert GloVe file format to Word2Vec format and load pre-trained word vectors is showcased, providing practical insights into the implementation of word vector models (relevance score: 2)", "The practical application of word similarities and analogies using Gensim's word vectors is illustrated, demonstrating the ability to find similar words and perform analogies such as 'man is to king as woman is to what', showcasing the effective manipulation of vector space (relevance score: 1)"]}], 'duration': 540.183, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rXD5-xhemo/pics/8rXD5-xhemo4365434.jpg', 'highlights': ["The calculation of the difference between expected and actual context words determines the slope for changing word representation, aimed at improving the model's predictive ability.", 'A challenge is presented to apply the same concept to context words.', 'The chapter introduces essential data science tools like NumPy, Matplotlib, scikit-learn, and Gensim, emphasizing their importance for mathematical operations and graphing.', 'Gensim is highlighted as a word similarity package that has evolved to efficiently handle word vectors and similarities, particularly useful for large-scale operations.', 'The instructor explains the use of GloVe word vectors, emphasizing their convenience due to their smaller size compared to vectors distributed by Facebook and Google, enabling efficient processing on a laptop.', "The demonstration of Gensim's capability to convert GloVe file format to Word2Vec format and load pre-trained word vectors is showcased, providing practical insights into the implementation of word vector models.", "The practical application of word similarities and analogies using Gensim's word vectors is illustrated, demonstrating the ability to find similar words and perform analogies such as 'man is to king as woman is to what', showcasing the effective manipulation of vector space."]}], 'highlights': ['The course aims to teach effective, modern methods for deep learning and techniques like recurrent networks and attention for NLP models.', 'The practical aspect of the course involves teaching students how to build systems for NLP problems like word meaning, dependency parsing, and machine translation.', 'The course has been updated with new content and will use PyTorch instead of TensorFlow, with a shift to five one-week assignments and no midterm, with a focus on using Microsoft Azure for GPU computing.', 'The grading system involves five assignments, a final project with custom options, and a participation component, emphasizing original work and adherence to collaboration policies.', 'The problem sets will include assignments using Python, NumPy, and PyTorch, with additional support provided for Python and NumPy review sessions and a shift to GPU computing using Microsoft Azure.', 'The plan for the class includes a brief introduction, course logistics, and a discussion on word vectors and the word2vec algorithm, indicating a focus on practical and technical aspects of natural language processing with deep learning.', 'Most of modern deep learning is powered by the parallelism scalability of GPUs.', 'Two options for the final project - default question answering system over squad dataset and the option to propose a custom final project, along with the team size allowed for the final project.', 'The Word2Vec algorithm, introduced by Tamás Mikulov in 2013, had a significant impact on the field of NLP and led to a shift towards neural networks.', 'The distributed representation of words involves using a numeric vector to represent the meaning of each word, with typical dimensions ranging from 50 to 300, providing a denser, non-zero vector representation.', 'The process involves representing words as vectors and iteratively predicting the context of words to achieve a good word vector space.', 'The model uses two vector representations for each word, one for the center word and one for the context word, simplifying the prediction process.', 'The process of calculating word vectors u and v involves starting with random vectors for each word and iteratively changing them to optimize higher probability for words occurring in contexts.', 'The process involves finding the partial derivative of the log of the numerator and the denominator, resulting in the numerator as UO and the denominator as the sum of the probability of x given c times ux.', "The calculation of the difference between expected and actual context words determines the slope for changing word representation, aimed at improving the model's predictive ability.", 'The chapter introduces essential data science tools like NumPy, Matplotlib, scikit-learn, and Gensim, emphasizing their importance for mathematical operations and graphing.']}