title
Big Data and Hadoop 1 | Hadoop Tutorial 1 | Big Data Tutorial 1 | Hadoop Tutorial for Beginners - 1

description
( Hadoop Training: https://www.edureka.co/hadoop ) Watch our New and Updated Hadoop Tutorial For Beginners: https://goo.gl/xeEV6m Check our Hadoop Tutorial blog series: https://goo.gl/LFesy8 This is Part 1 of 8 week Big Data and Hadoop course. The 3hr Interactive live class covers What is Big Data, What is Hadoop and Why Hadoop? We also understand the details of Hadoop Distributed File System ( HDFS). The Tutorial covers in detail about Name Node, Data Nodes, Secondary Name Node, the need for Hadoop. It goes into the details of concepts like Rack Awareness, Data Replication, Reading and Writing on HDFS. We will also show how to setup the cloudera VM on your machine. #edureka #edurekaBigData #edurekaHadoop #BigData #Hadoop #BigDataTutorial #HadoopTutorial #BigDataTraining #HadoopTraining - - - - - - - - - - - - - - Subscribe to our channel to get video updates. Hit the subscribe button above: https://goo.gl/6ohpTV Instagram: https://www.instagram.com/edureka_learning Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Telegram: https://t.me/edurekaupdates - - - - - - - - - - - - - - - - - How it Works? 1. This is a 8 Week Instructor led Online Course. 2. We have a 3-hour Live and Interactive Sessions every Sunday. 3. We have 3 hours of Practical Work involving Lab Assignments, Case Studies and Projects every week which can be done at your own pace. We can also provide you Remote Access to Our Hadoop Cluster for doing Practicals. 4. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 5. At the end of the training you will have to undergo a 2-hour LIVE Practical Exam based on which we will provide you a Grade and a Verifiable Certificate! - - - - - - - - - - - - - - About the Course Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Setting up the Hadoop Cluster, Map-Reduce,PIG, HIVE, HBase, Zookeeper, SQOOP etc. will be covered in the course. - - - - - - - - - - - - - - Course Objectives After the completion of the Hadoop Course at Edureka, you should be able to: Master the concepts of Hadoop Distributed File System. Understand Cluster Setup and Installation. Understand MapReduce and Functional programming. Understand How Pig is tightly coupled with Map-Reduce. Learn how to use Hive, How you can load data into HIVE and query data from Hive. Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing. Have a good understanding of ZooKeeper service and Sqoop. Develop a working Hadoop Architecture. - - - - - - - - - - - - - - Why Learn Hadoop? BiG Data! A Worldwide Problem? According to Wikipedia, "Big data is collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." In simpler terms, Big Data is a term given to large volumes of data that organizations store and process. However, It is becoming very difficult for companies to store, retrieve and process the ever-increasing data. If any company gets hold on managing its data well, nothing can stop it from becoming the next BIG success! The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is - Hadoop, which is not less than a panacea for all those companies working with BIG DATA in a variety of applications and has become an integral part for storing, handling, evaluating and retrieving hundreds of terabytes, and even petabytes of data. - - - - - - - - - - - - - - Some of the top companies using Hadoop: The importance of Hadoop is evident from the fact that there are many global MNCs that are using Hadoop and consider it as an integral part of their functioning, such as companies like Yahoo and Facebook! On February 19, 2008, Yahoo! Inc. established the world's largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on over 10,000 core Linux cluster and generates data that is now widely used in every Yahoo! Web search query. Opportunities for Hadoopers! Opportunities for Hadoopers are infinite - from a Hadoop Developer, to a Hadoop Tester or a Hadoop Architect, and so on. If cracking and managing BIG Data is your passion in life, then think no more and Join Edureka's Hadoop Online course and carve a niche for yourself! Happy Hadooping! For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free).

detail
{'title': 'Big Data and Hadoop 1 | Hadoop Tutorial 1 | Big Data Tutorial 1 | Hadoop Tutorial for Beginners - 1', 'heatmap': [{'end': 2320.456, 'start': 2185.846, 'weight': 0.714}, {'end': 2710.186, 'start': 2577.123, 'weight': 0.728}, {'end': 3480.405, 'start': 3219.336, 'weight': 0.752}, {'end': 5290.792, 'start': 5029.028, 'weight': 0.76}, {'end': 6056.54, 'start': 5923.461, 'weight': 0.789}, {'end': 7217.544, 'start': 6437.904, 'weight': 0.959}], 'summary': 'This tutorial series on hadoop introduces concepts like hdfs, big data analytics, io speed challenges, hdfs features, name node, core concepts, data writing, data management, and data processes. it also includes real-world examples and insights from professionals, making it suitable for beginners and interview preparation.', 'chapters': [{'end': 708.223, 'segs': [{'end': 118.238, 'src': 'embed', 'start': 84.168, 'weight': 1, 'content': [{'end': 87.229, 'text': 'that hi, Gaurav, how are you on the chat window?', 'start': 84.168, 'duration': 3.061}, {'end': 88.37, 'text': 'that what are you doing?', 'start': 87.229, 'duration': 1.141}, {'end': 89.51, 'text': 'what is your current profile?', 'start': 88.37, 'duration': 1.14}, {'end': 96.453, 'text': "and I'll keep sharing with everyone so that everyone knows what the kind of co-learners we have.", 'start': 89.51, 'duration': 6.943}, {'end': 100.062, 'text': 'Can you quickly write it in the chat window?', 'start': 98.681, 'duration': 1.381}, {'end': 105.284, 'text': "You can just write your name name we'll come to know and you can just quickly write your experience level.", 'start': 100.082, 'duration': 5.202}, {'end': 108.506, 'text': 'what is your motivation for doing this course and all those things?', 'start': 105.284, 'duration': 3.222}, {'end': 110.207, 'text': 'A little bit about yourself.', 'start': 109.146, 'duration': 1.061}, {'end': 118.238, 'text': "Okay, so I'm waiting for a couple of more people to join.", 'start': 115.635, 'duration': 2.603}], 'summary': 'Gaurav is collecting information from participants for a course, waiting for more people to join.', 'duration': 34.07, 'max_score': 84.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM84168.jpg'}, {'end': 441.704, 'src': 'embed', 'start': 399.939, 'weight': 3, 'content': [{'end': 403.162, 'text': 'put it that there may be a lot of such things that this is a live.', 'start': 399.939, 'duration': 3.223}, {'end': 404.983, 'text': "I'll be talking to multiple people.", 'start': 403.162, 'duration': 1.821}, {'end': 408.065, 'text': "so please don't mind if the question is not answered clear.", 'start': 404.983, 'duration': 3.082}, {'end': 410.246, 'text': 'please feel free to write it back on the chat window.', 'start': 408.065, 'duration': 2.181}, {'end': 413.048, 'text': "so more than what I'll do is I'll, I'll unmute you.", 'start': 410.246, 'duration': 2.802}, {'end': 417.591, 'text': "you can also introduce yourself to the audience and I that way you'll test your mic as well.", 'start': 413.048, 'duration': 4.543}, {'end': 418.231, 'text': 'is that fine?', 'start': 417.591, 'duration': 0.64}, {'end': 420.573, 'text': "more, okay, someone, I'm just unmuting you.", 'start': 418.231, 'duration': 2.342}, {'end': 422.414, 'text': "please make sure there's no background noise.", 'start': 420.573, 'duration': 1.841}, {'end': 426.121, 'text': 'Okay, so we have Mohan on now.', 'start': 424.099, 'duration': 2.022}, {'end': 431.326, 'text': 'Mohan, can you quickly introduce yourself? We are eager to hear about you.', 'start': 426.341, 'duration': 4.985}, {'end': 435.659, 'text': 'Can everyone hear me? I can hear you.', 'start': 432.227, 'duration': 3.432}, {'end': 437.68, 'text': 'Yes Yes, everyone says yes, Mohan.', 'start': 435.899, 'duration': 1.781}, {'end': 439.061, 'text': 'Okay, good.', 'start': 438.581, 'duration': 0.48}, {'end': 441.704, 'text': 'My name is Mohan Gopalakrishnan.', 'start': 439.822, 'duration': 1.882}], 'summary': 'Live conversation with multiple participants for introductions and mic checks.', 'duration': 41.765, 'max_score': 399.939, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM399939.jpg'}, {'end': 592.183, 'src': 'embed', 'start': 562.539, 'weight': 0, 'content': [{'end': 563.719, 'text': "There's nothing to feel humbled.", 'start': 562.539, 'duration': 1.18}, {'end': 564.519, 'text': "I mean, I'm learning.", 'start': 563.759, 'duration': 0.76}, {'end': 572.601, 'text': "I mean, so I probably will be the slowest learner here because I haven't done hands-on technical work in 10 years.", 'start': 564.76, 'duration': 7.841}, {'end': 576.342, 'text': 'So let me hope I learn as much as I can.', 'start': 572.681, 'duration': 3.661}, {'end': 577.813, 'text': "Don't worry, sir.", 'start': 577.133, 'duration': 0.68}, {'end': 581.016, 'text': "We'll be there to help you out and whatever help you need.", 'start': 577.854, 'duration': 3.162}, {'end': 583.477, 'text': 'We do have a 24 by 7 support team.', 'start': 581.036, 'duration': 2.441}, {'end': 584.678, 'text': 'So this is for everyone.', 'start': 583.497, 'duration': 1.181}, {'end': 590.422, 'text': 'Anyone facing any problem after the class, we have a 24 by 7 help desk where you can come.', 'start': 585.298, 'duration': 5.124}, {'end': 592.183, 'text': 'You can get one-on-one live support.', 'start': 590.462, 'duration': 1.721}], 'summary': 'Individual seeks help for technical learning after 10 years, offered 24/7 support', 'duration': 29.644, 'max_score': 562.539, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM562539.jpg'}, {'end': 693.217, 'src': 'embed', 'start': 667.712, 'weight': 2, 'content': [{'end': 673.055, 'text': "You can just quickly write on the chat window in the next two, three seconds and I'll unmute you or else I'll get started.", 'start': 667.712, 'duration': 5.343}, {'end': 680.48, 'text': 'So is there anyone who wants to introduce himself? Anand Kumaran is saying hi to Mohan.', 'start': 674.356, 'duration': 6.124}, {'end': 682.541, 'text': "So I'll pass it on.", 'start': 681.62, 'duration': 0.921}, {'end': 685.273, 'text': "Okay, so let's get started.", 'start': 683.792, 'duration': 1.481}, {'end': 688.635, 'text': "I mean, don't worry, we will have more opportunities to talk to each other.", 'start': 685.393, 'duration': 3.242}, {'end': 693.217, 'text': "Okay, we'll have more people and we'll have more opportunities to discuss.", 'start': 689.095, 'duration': 4.122}], 'summary': 'Informal introductions made during meeting, no significant data discussed.', 'duration': 25.505, 'max_score': 667.712, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM667712.jpg'}], 'start': 6.75, 'title': 'Introduction to hadoop and interactive session', 'summary': 'Introduces hadoop and hdfs with 12-13 participants from various domains, including professionals with 8, 24 years of experience, and a focus on distributed audit processing.', 'chapters': [{'end': 299.899, 'start': 6.75, 'title': 'Introduction to hadoop and hdfs', 'summary': 'Introduces the session on hadoop and hdfs, with 12-13 participants introducing themselves, including suman with 8 years of it experience, mohan as the head of adobe development center at chennai, devashish as a business analyst for 4 years in bfsi domains, and vidyasagar as a web developer with 5 years experience working with ibm.', 'duration': 293.149, 'highlights': ['12-13 participants introduced themselves, including Suman with 8 years of IT experience, Mohan as the head of Adobe Development Center at Chennai, Devashish as a business analyst for 4 years in BFSI domains, and Vidyasagar as a web developer with 5 years experience working with IBM.', 'Sandeep, an ex-Yahoo employee, and Sameer, who is going to University of Buffalo in New York this fall for MS, are other notable participants.', "Participants' experience ranges from 2.5 years for Yashwant, who previously worked as a PHP web developer, to 9 years for Syed Munawar, who is working in Oracle for 6 years for identity management services.", 'An echo issue was mentioned during the session.']}, {'end': 708.223, 'start': 300.439, 'title': 'Interactive session with participants from multiple domains', 'summary': 'Discusses an interactive session with around 13 to 14 participants from diverse domains, including a senior professional with 24 years of experience, aiming to transition into big data and analytics, emphasizing the shift to distributed audit processing.', 'duration': 407.784, 'highlights': ['Mohan Gopalakrishnan, with 24 years of experience, is the senior-most participant aiming to transition into big data and analytics, reflecting the interest in shifting to distributed audit processing.', 'The class consists of around 13 to 14 participants from diverse domains, including professionals experienced in technologies like Teradata, Informatica, Business Objects, and TIBCO, indicating a wide range of expertise present.', 'The session aims to maintain interactivity, allowing participants to raise questions and engage in discussions, showcasing a collaborative learning environment.', 'Participants are assured of 24/7 support, including one-on-one live assistance for technical issues, emphasizing the commitment to providing comprehensive support and assistance.', "The speaker emphasizes the future of IT lies in analytics and big data, aligning with the motivation of participants like Rohit, who aims to transition into analytics and big data, reflecting the industry's direction."]}], 'duration': 701.473, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6750.jpg', 'highlights': ['Mohan Gopalakrishnan, with 24 years of experience, aims to transition into big data and analytics', "Participants' experience ranges from 2.5 to 9 years, with diverse technology backgrounds", "Session emphasizes the future of IT in analytics and big data, aligning with participants' motivations", 'Participants assured of 24/7 support and one-on-one live assistance for technical issues', 'Session aims to maintain interactivity, allowing participants to raise questions and engage in discussions']}, {'end': 2526.627, 'segs': [{'end': 1527.94, 'src': 'embed', 'start': 1499.572, 'weight': 6, 'content': [{'end': 1501.732, 'text': 'on that, then you can have a smiley for this one.', 'start': 1499.572, 'duration': 2.16}, {'end': 1505.053, 'text': 'I want to give a smiley for this.', 'start': 1501.732, 'duration': 3.321}, {'end': 1507.734, 'text': 'okay, but some people are happy with your own.', 'start': 1505.053, 'duration': 2.681}, {'end': 1509.275, 'text': 'not everyone is okay.', 'start': 1507.734, 'duration': 1.541}, {'end': 1510.995, 'text': 'lot of people are happy with your answer.', 'start': 1509.275, 'duration': 1.72}, {'end': 1516.292, 'text': 'okay, I think more than is not one is not happy with your odds.', 'start': 1510.995, 'duration': 5.297}, {'end': 1520.255, 'text': 'maybe more than one is more than one is also happy.', 'start': 1516.292, 'duration': 3.963}, {'end': 1522.356, 'text': 'okay, so everyone is happy with your answer.', 'start': 1520.255, 'duration': 2.101}, {'end': 1526.439, 'text': 'if there is anyone who is not happy with all the answer, you can make a bad face at me.', 'start': 1522.356, 'duration': 4.083}, {'end': 1527.94, 'text': "I'll not tell it to Raul.", 'start': 1526.439, 'duration': 1.501}], 'summary': 'Most people are happy with the answer, but not everyone. possibly more than one person is not happy.', 'duration': 28.368, 'max_score': 1499.572, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM1499572.jpg'}, {'end': 2214.041, 'src': 'embed', 'start': 2185.846, 'weight': 2, 'content': [{'end': 2192.581, 'text': "okay, now, Now we've understood that the amount of data being generated.", 'start': 2185.846, 'duration': 6.735}, {'end': 2198.729, 'text': "Let's see, what is the real challenge? I mean, what is the real problem? Now, we have so much, storing data is not a problem at all.", 'start': 2192.641, 'duration': 6.088}, {'end': 2202.054, 'text': 'Everyone has two TV, five TV hard drives in their homes.', 'start': 2198.81, 'duration': 3.244}, {'end': 2206.998, 'text': 'So what is the challenge? The challenge here will be explained by this example.', 'start': 2202.494, 'duration': 4.504}, {'end': 2214.041, 'text': "So let's see we have this traditional way of storing data where we have one single machine on the left hand side okay, which has four IO channels,", 'start': 2207.058, 'duration': 6.983}], 'summary': 'Data generation is high, storage is easy, challenge is data management.', 'duration': 28.195, 'max_score': 2185.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2185846.jpg'}, {'end': 2320.456, 'src': 'heatmap', 'start': 2185.846, 'weight': 0.714, 'content': [{'end': 2192.581, 'text': "okay, now, Now we've understood that the amount of data being generated.", 'start': 2185.846, 'duration': 6.735}, {'end': 2198.729, 'text': "Let's see, what is the real challenge? I mean, what is the real problem? Now, we have so much, storing data is not a problem at all.", 'start': 2192.641, 'duration': 6.088}, {'end': 2202.054, 'text': 'Everyone has two TV, five TV hard drives in their homes.', 'start': 2198.81, 'duration': 3.244}, {'end': 2206.998, 'text': 'So what is the challenge? The challenge here will be explained by this example.', 'start': 2202.494, 'duration': 4.504}, {'end': 2214.041, 'text': "So let's see we have this traditional way of storing data where we have one single machine on the left hand side okay, which has four IO channels,", 'start': 2207.058, 'duration': 6.983}, {'end': 2216.402, 'text': 'which means it has four hard drives.', 'start': 2214.041, 'duration': 2.361}, {'end': 2217.943, 'text': 'okay, and each channel can.', 'start': 2216.402, 'duration': 1.541}, {'end': 2220.183, 'text': 'each hard drive can run at 100 MB per second.', 'start': 2217.943, 'duration': 2.24}, {'end': 2231.008, 'text': "So the challenge here is, let's see if how many of you can tell me how much time it will take to read this one terabyte of data from this machine.", 'start': 2220.684, 'duration': 10.324}, {'end': 2234.724, 'text': 'Can someone, can everyone try this out?', 'start': 2232.142, 'duration': 2.582}, {'end': 2243.47, 'text': "It's a maths problem, and so the challenge is one terabyte of data, four parallel reads happening okay through four hard drives,", 'start': 2234.804, 'duration': 8.666}, {'end': 2245.932, 'text': 'and each channel reads 100 MB per second.', 'start': 2243.47, 'duration': 2.462}, {'end': 2247.613, 'text': 'Most of you guys are engineers.', 'start': 2246.292, 'duration': 1.321}, {'end': 2252.597, 'text': 'I want you to just quickly get back to, you can use a calculator as well.', 'start': 2247.693, 'duration': 4.904}, {'end': 2254.218, 'text': 'Okay, we have an answer already.', 'start': 2252.857, 'duration': 1.361}, {'end': 2256.78, 'text': "I'll reveal your answer very soon.", 'start': 2255.499, 'duration': 1.281}, {'end': 2260.162, 'text': 'Anyone else can answer? Okay, Yashwant has also given me an answer.', 'start': 2257.2, 'duration': 2.962}, {'end': 2260.943, 'text': "Guys, you're quick.", 'start': 2260.282, 'duration': 0.661}, {'end': 2263.512, 'text': 'okay, someone has given me an answer.', 'start': 2262.071, 'duration': 1.441}, {'end': 2266.013, 'text': 'someone says i want that answer in minutes.', 'start': 2263.512, 'duration': 2.501}, {'end': 2269.434, 'text': 'okay, i want that answer in minutes.', 'start': 2266.013, 'duration': 3.421}, {'end': 2275.456, 'text': 'so so you, the question is, and i want everyone else, everyone to try this.', 'start': 2269.434, 'duration': 6.022}, {'end': 2276.717, 'text': 'everyone has to give an answer.', 'start': 2275.456, 'duration': 1.261}, {'end': 2279.638, 'text': "whoever doesn't give me an answer, i will not allow them in the class.", 'start': 2276.717, 'duration': 2.921}, {'end': 2285.2, 'text': 'okay, okay, so i have an answer for anand kumaran, who says 170 minutes.', 'start': 2279.638, 'duration': 5.562}, {'end': 2287.061, 'text': 'raul tivari says three seconds.', 'start': 2285.2, 'duration': 1.861}, {'end': 2287.661, 'text': 'i will.', 'start': 2287.061, 'duration': 0.6}, {'end': 2288.821, 'text': 'you are really fast, man.', 'start': 2287.661, 'duration': 1.16}, {'end': 2290.022, 'text': 'it cannot happen in three seconds.', 'start': 2288.821, 'duration': 1.201}, {'end': 2292.546, 'text': 'okay, someone says 2500.', 'start': 2291.365, 'duration': 1.181}, {'end': 2296.828, 'text': 'okay, lot of people have given me the right answer now.', 'start': 2292.546, 'duration': 4.282}, {'end': 2302.01, 'text': 'okay, I just tells me 10.24 seconds.', 'start': 2296.828, 'duration': 5.182}, {'end': 2302.811, 'text': 'I take a look quickly.', 'start': 2302.01, 'duration': 0.801}, {'end': 2303.271, 'text': 'explain me.', 'start': 2302.811, 'duration': 0.46}, {'end': 2304.732, 'text': 'how did you arrive at that?', 'start': 2303.271, 'duration': 1.461}, {'end': 2307.053, 'text': 'quite a few people have given me answer like 45 minutes.', 'start': 2304.732, 'duration': 2.321}, {'end': 2307.753, 'text': '43 minutes.', 'start': 2307.053, 'duration': 0.7}, {'end': 2312.115, 'text': '40 minutes, which approximately is the answer, but I would still like to understand how much they.', 'start': 2307.753, 'duration': 4.362}, {'end': 2312.916, 'text': 'how did he?', 'start': 2312.115, 'duration': 0.801}, {'end': 2314.557, 'text': 'okay, some deep is also being an answer.', 'start': 2312.916, 'duration': 1.641}, {'end': 2316.277, 'text': '40 minutes approximately.', 'start': 2314.557, 'duration': 1.72}, {'end': 2317.658, 'text': 'okay idea, how did you arrive at 10.45?', 'start': 2316.277, 'duration': 1.381}, {'end': 2320.456, 'text': 'got a big Okay.', 'start': 2317.658, 'duration': 2.798}], 'summary': 'Challenge: reading 1tb data from 4 hard drives takes around 40 minutes.', 'duration': 134.61, 'max_score': 2185.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2185846.jpg'}, {'end': 2247.613, 'src': 'embed', 'start': 2220.684, 'weight': 5, 'content': [{'end': 2231.008, 'text': "So the challenge here is, let's see if how many of you can tell me how much time it will take to read this one terabyte of data from this machine.", 'start': 2220.684, 'duration': 10.324}, {'end': 2234.724, 'text': 'Can someone, can everyone try this out?', 'start': 2232.142, 'duration': 2.582}, {'end': 2243.47, 'text': "It's a maths problem, and so the challenge is one terabyte of data, four parallel reads happening okay through four hard drives,", 'start': 2234.804, 'duration': 8.666}, {'end': 2245.932, 'text': 'and each channel reads 100 MB per second.', 'start': 2243.47, 'duration': 2.462}, {'end': 2247.613, 'text': 'Most of you guys are engineers.', 'start': 2246.292, 'duration': 1.321}], 'summary': 'Challenge: calculate time to read 1tb data with 4 parallel reads at 100mb/s.', 'duration': 26.929, 'max_score': 2220.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2220684.jpg'}, {'end': 2296.828, 'src': 'embed', 'start': 2269.434, 'weight': 0, 'content': [{'end': 2275.456, 'text': 'so so you, the question is, and i want everyone else, everyone to try this.', 'start': 2269.434, 'duration': 6.022}, {'end': 2276.717, 'text': 'everyone has to give an answer.', 'start': 2275.456, 'duration': 1.261}, {'end': 2279.638, 'text': "whoever doesn't give me an answer, i will not allow them in the class.", 'start': 2276.717, 'duration': 2.921}, {'end': 2285.2, 'text': 'okay, okay, so i have an answer for anand kumaran, who says 170 minutes.', 'start': 2279.638, 'duration': 5.562}, {'end': 2287.061, 'text': 'raul tivari says three seconds.', 'start': 2285.2, 'duration': 1.861}, {'end': 2287.661, 'text': 'i will.', 'start': 2287.061, 'duration': 0.6}, {'end': 2288.821, 'text': 'you are really fast, man.', 'start': 2287.661, 'duration': 1.16}, {'end': 2290.022, 'text': 'it cannot happen in three seconds.', 'start': 2288.821, 'duration': 1.201}, {'end': 2292.546, 'text': 'okay, someone says 2500.', 'start': 2291.365, 'duration': 1.181}, {'end': 2296.828, 'text': 'okay, lot of people have given me the right answer now.', 'start': 2292.546, 'duration': 4.282}], 'summary': 'Participants gave varying answers, with some providing correct responses after initial attempts.', 'duration': 27.394, 'max_score': 2269.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2269434.jpg'}, {'end': 2502.485, 'src': 'embed', 'start': 2456.201, 'weight': 1, 'content': [{'end': 2464.925, 'text': 'so now, if you had to crowd, said, analyze over 500 terabytes of data, do you think it is possible to do it at all using this method?', 'start': 2456.201, 'duration': 8.724}, {'end': 2467.27, 'text': "It's completely impossible.", 'start': 2466.209, 'duration': 1.061}, {'end': 2470.412, 'text': 'Every day you will not, it will take years to actually read that data.', 'start': 2467.33, 'duration': 3.082}, {'end': 2472.433, 'text': 'If not years, then at least days and months.', 'start': 2470.452, 'duration': 1.981}, {'end': 2474.954, 'text': "Okay, so what we'll do?", 'start': 2473.073, 'duration': 1.881}, {'end': 2475.875, 'text': 'what is the solution then?', 'start': 2474.954, 'duration': 0.921}, {'end': 2478.237, 'text': 'Can we look at the figure on the right hand side?', 'start': 2476.555, 'duration': 1.682}, {'end': 2481.118, 'text': 'What we do here is that we divide this data.', 'start': 2478.577, 'duration': 2.541}, {'end': 2483.42, 'text': 'we partition this data into 10 parts.', 'start': 2481.118, 'duration': 2.302}, {'end': 2488.181, 'text': 'okay, store the, say, 10 parts in different machines.', 'start': 2483.9, 'duration': 4.281}, {'end': 2494.943, 'text': 'okay, and then we try to see how much time it will take to read, and this is a very, very simplistic view of things.', 'start': 2488.181, 'duration': 6.762}, {'end': 2496.423, 'text': 'it is not all that simple.', 'start': 2494.943, 'duration': 1.48}, {'end': 2502.485, 'text': "but assume for us to start with, let's assume that we had this one terabyte of data divided across 10 machines,", 'start': 2496.423, 'duration': 6.062}], 'summary': 'Divide 500 terabytes of data into 10 parts for efficient analysis and storage on different machines.', 'duration': 46.284, 'max_score': 2456.201, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2456201.jpg'}], 'start': 708.703, 'title': 'Hadoop and big data analytics', 'summary': 'Covers the structure and content of a hadoop course, understanding big data in class with real-world examples, introduction to data analytics and big data, and the challenges of big data analytics, including time efficiency and scalability.', 'chapters': [{'end': 902.735, 'start': 708.703, 'title': 'Hadoop course structure', 'summary': 'Introduces the structure and content of a hadoop course, covering topics such as hdfs, mapreduce, pig, hive, and java prerequisites, with emphasis on hands-on practice and support availability.', 'duration': 194.032, 'highlights': ['The course covers topics such as HDFS, MapReduce, Pig, Hive, Zookeeper, Scoop, HBase, and Java prerequisites for MapReduce programming.', 'The course emphasizes hands-on practice, with the second week dedicated to setting up a Hadoop cluster and running HDFS commands.', 'The support team is available to address questions and provide assistance, with the option for participants to attend a free Java course if needed.', 'The instructor, Abhishek, has over 10 years of experience in the IT world and has been working on Hadoop for a couple of years.', 'Participants are encouraged to ask questions through the chat window, with the instructor committed to addressing all pending queries during logical breaks.']}, {'end': 1855.09, 'start': 903.235, 'title': 'Understanding big data in class', 'summary': 'Discusses the concept of big data, highlighting examples from facebook, airline industry, power grid system, and stock exchange, while emphasizing the distinction between structured and unstructured data with interactive student engagement.', 'duration': 951.855, 'highlights': ['The amount of data generated by Facebook is around 500 terabytes per day, making it a significant example of big data.', 'A single airplane generates around 10 terabytes of data for every half-hour flight, equivalent to the daily data generated by Facebook, illustrating the substantial data volume produced by the airline industry.', 'The power grid system generates a vast amount of data that needs analysis, particularly for resolving disputes related to electricity distribution between states, showcasing another instance of big data application.', 'The New York Stock Exchange generates substantial unstructured data, serving as an example of big data generation in the financial domain.', 'The chapter engages students in an interactive discussion to differentiate between structured and unstructured data, emphasizing the importance of understanding unstructured data in the context of big data.']}, {'end': 2050.074, 'start': 1855.17, 'title': 'Introduction to data analytics and big data', 'summary': "Discusses the introduction to big data, including unstructured data and its relevance to hadoop, as well as ajay's interest in data analytics using r and sas, with a focus on understanding and analyzing large amounts of unstructured data.", 'duration': 194.904, 'highlights': ["Ajay's interest in data analytics using R and SAS", 'Discussion on unstructured data and its relevance to Hadoop', 'Various responses to the concept of unstructured data', 'Humorous interaction and engagement among participants', 'Reference to big data in Indian government offices']}, {'end': 2526.627, 'start': 2050.094, 'title': 'Big data analytics challenges', 'summary': 'Discusses the challenges of handling huge unstructured data, the need for big data analytics, and the limitations of traditional data storage, with a focus on time efficiency and scalability, exemplified by the comparison of reading one terabyte of data from a single machine versus ten machines in parallel.', 'duration': 476.533, 'highlights': ['The need for big data analytics', 'Challenges of traditional data storage', 'Impact of distributed storage on data processing time']}], 'duration': 1817.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM708703.jpg', 'highlights': ['The course covers topics such as HDFS, MapReduce, Pig, Hive, Zookeeper, Scoop, HBase, and Java prerequisites for MapReduce programming.', 'The amount of data generated by Facebook is around 500 terabytes per day, making it a significant example of big data.', 'The course emphasizes hands-on practice, with the second week dedicated to setting up a Hadoop cluster and running HDFS commands.', 'A single airplane generates around 10 terabytes of data for every half-hour flight, equivalent to the daily data generated by Facebook, illustrating the substantial data volume produced by the airline industry.', 'The support team is available to address questions and provide assistance, with the option for participants to attend a free Java course if needed.', 'The power grid system generates a vast amount of data that needs analysis, particularly for resolving disputes related to electricity distribution between states, showcasing another instance of big data application.', 'The instructor, Abhishek, has over 10 years of experience in the IT world and has been working on Hadoop for a couple of years.', 'The New York Stock Exchange generates substantial unstructured data, serving as an example of big data generation in the financial domain.', 'The chapter engages students in an interactive discussion to differentiate between structured and unstructured data, emphasizing the importance of understanding unstructured data in the context of big data.', 'The need for big data analytics']}, {'end': 3422.921, 'segs': [{'end': 2710.186, 'src': 'heatmap', 'start': 2577.123, 'weight': 0.728, 'content': [{'end': 2578.385, 'text': "so let's let's move on.", 'start': 2577.123, 'duration': 1.262}, {'end': 2579.026, 'text': "let's see, then.", 'start': 2578.385, 'duration': 0.641}, {'end': 2583.273, 'text': "So we've understood.", 'start': 2581.152, 'duration': 2.121}, {'end': 2592.415, 'text': 'now here is that to solve, to do this big data analytics, we have to figure out a way of putting this data across multiple nodes.', 'start': 2583.273, 'duration': 9.142}, {'end': 2598.417, 'text': "Okay, when I say nodes, don't worry about what, I keep on introducing jargon in between and I'll explain it.", 'start': 2592.595, 'duration': 5.822}, {'end': 2602.558, 'text': "Don't worry if you do not understand what a node is, I'm sure everyone can guess what a node is.", 'start': 2598.497, 'duration': 4.061}, {'end': 2608.56, 'text': 'but we have to distribute this data across different machines or nodes, okay,', 'start': 2602.958, 'duration': 5.602}, {'end': 2615.523, 'text': 'and that looks like a probable solution to solve this problem of analyzing or crunching this huge amount of data, okay.', 'start': 2608.56, 'duration': 6.963}, {'end': 2625.067, 'text': "So let's see how are we, so it takes 4.5 minutes, and let's see what a distributed file system is.", 'start': 2616.024, 'duration': 9.043}, {'end': 2630.19, 'text': "So we have understood what, why do we need a DFS, now let's understand what a DFS is.", 'start': 2625.388, 'duration': 4.802}, {'end': 2643.735, 'text': 'so here what you see on the left is for machines, for storage devices sitting at four different locations, okay, and once.', 'start': 2633.325, 'duration': 10.41}, {'end': 2647.839, 'text': 'So they have their own file system.', 'start': 2645.438, 'duration': 2.401}, {'end': 2649.139, 'text': "They're physically separate machines.", 'start': 2647.859, 'duration': 1.28}, {'end': 2657.321, 'text': "After I create a distributed file system out of these, what I'll get is something like this where, at the top of it,", 'start': 2650.119, 'duration': 7.202}, {'end': 2663.023, 'text': 'it will look like we are accessing a single file system, but underlying hardware is sitting at different places.', 'start': 2657.321, 'duration': 5.702}, {'end': 2666.144, 'text': 'At a logical level, they all come in a directory structure.', 'start': 2663.783, 'duration': 2.361}, {'end': 2670.945, 'text': 'So if you look at your Windows machine, it will look like there are multiple directories which you are having.', 'start': 2666.164, 'duration': 4.781}, {'end': 2672.626, 'text': 'Does that make sense??', 'start': 2671.765, 'duration': 0.861}, {'end': 2674.128, 'text': 'Is that making sense to everyone??', 'start': 2672.706, 'duration': 1.422}, {'end': 2676.33, 'text': 'What is the difference between?', 'start': 2675.189, 'duration': 1.141}, {'end': 2679.974, 'text': 'how will this physical thing get converted into a distributed file system?', 'start': 2676.33, 'duration': 3.644}, {'end': 2684.319, 'text': 'We will look into details, but just to give you an idea, this is how it will look like.', 'start': 2680.495, 'duration': 3.824}, {'end': 2685.78, 'text': 'Okay, great.', 'start': 2684.919, 'duration': 0.861}, {'end': 2687.522, 'text': 'A lot of people are giving me smileys.', 'start': 2685.8, 'duration': 1.722}, {'end': 2692.087, 'text': 'If there is anyone who has not understood this, I can quickly tell you again.', 'start': 2687.682, 'duration': 4.405}, {'end': 2695.224, 'text': 'So one, two, three, everyone has understood.', 'start': 2693.541, 'duration': 1.683}, {'end': 2697.489, 'text': 'Great So everyone has given me smileys now.', 'start': 2695.425, 'duration': 2.064}, {'end': 2699.533, 'text': 'Great Some people are getting lazy.', 'start': 2697.609, 'duration': 1.924}, {'end': 2701.216, 'text': 'Okay, Ajay says, please, once again.', 'start': 2699.593, 'duration': 1.623}, {'end': 2701.858, 'text': 'Sure, Ajay.', 'start': 2701.417, 'duration': 0.441}, {'end': 2702.8, 'text': 'So Ajay, see.', 'start': 2702.218, 'duration': 0.582}, {'end': 2710.186, 'text': 'There are these different machines which are sitting at different physical locations, okay, different physical locations.', 'start': 2704.383, 'duration': 5.803}], 'summary': 'Distributed file system distributes data across nodes, facilitating big data analytics. machines at different locations form a single file system.', 'duration': 133.063, 'max_score': 2577.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2577123.jpg'}, {'end': 2742.148, 'src': 'embed', 'start': 2717.369, 'weight': 0, 'content': [{'end': 2724.012, 'text': 'But logically, at a logical level, I have created a file system where all these machines look like they are a part of a single file system.', 'start': 2717.369, 'duration': 6.643}, {'end': 2731.098, 'text': 'Okay, so as we saw in the last slide that we have to have multiple machines from where multiple reads can be done in parallel.', 'start': 2724.432, 'duration': 6.666}, {'end': 2737.864, 'text': 'So, to solve that problem, we used multiple machines, not a single place to do IO, but at a higher level, at a logical level,', 'start': 2731.558, 'duration': 6.306}, {'end': 2742.148, 'text': 'at a directory structure level, they become a part of a file system.', 'start': 2737.864, 'duration': 4.284}], 'summary': 'A file system was created to make multiple machines appear as part of a single file system, allowing parallel reads.', 'duration': 24.779, 'max_score': 2717.369, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2717369.jpg'}], 'start': 2527.557, 'title': 'Io speed challenges and hadoop usage', 'summary': 'Discusses the challenges of io speed, the need for a distributed file system, and the role of hadoop framework for distributed processing. it also explores companies using hadoop, key components, and differences between hdfs cluster and mapreduce engine, including examples of google, yahoo, facebook, amazon, ibm, linkedin, and the aadhaar scheme.', 'chapters': [{'end': 2999.566, 'start': 2527.557, 'title': 'Challenges of io speed and the need for distributed file system', 'summary': 'Discusses the challenge of io speed, the need for a distributed file system to handle large unstructured data, and the role of hadoop as a framework for distributed processing of large data sets on commodity computers using a simple programming model.', 'duration': 472.009, 'highlights': ['The challenge of IO speed is emphasized over storage capacity, requiring a distributed file system to handle the large amount of data.', 'Explanation of a distributed file system, where physically separate machines appear as a single file system at a logical level, to support multiple reads in parallel.', 'Hadoop is introduced as a framework for distributed processing of large data sets on commodity computers using the MapReduce programming model.']}, {'end': 3422.921, 'start': 2999.566, 'title': 'Companies using hadoop & hadoop components', 'summary': 'Discusses companies using hadoop, including google, yahoo, facebook, amazon, ibm, linkedin, and the aadhaar scheme, and introduces hadoop components - hdfs and mapreduce, with an emphasis on understanding the key terms and their associations, and the difference between hdfs cluster and mapreduce engine.', 'duration': 423.355, 'highlights': ['The chapter emphasizes the companies using Hadoop, including Google, Yahoo, Facebook, Amazon, IBM, LinkedIn, and the Aadhaar scheme, highlighting the widespread adoption of Hadoop in big data analytics.', "The explanation of Hadoop components - HDFS and MapReduce, stresses the importance of understanding the terms 'name node', 'job tracker', 'data node', and 'task tracker', establishing the fundamental knowledge required for further learning.", 'The distinction between HDFS cluster and MapReduce engine is highlighted, with HDFS cluster representing the storage configuration of masters and slaves, and MapReduce engine being the programming model used to analyze the stored data.']}], 'duration': 895.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM2527557.jpg', 'highlights': ['The challenge of IO speed is emphasized over storage capacity, requiring a distributed file system to handle the large amount of data.', "The explanation of Hadoop components - HDFS and MapReduce, stresses the importance of understanding the terms 'name node', 'job tracker', 'data node', and 'task tracker', establishing the fundamental knowledge required for further learning.", 'The distinction between HDFS cluster and MapReduce engine is highlighted, with HDFS cluster representing the storage configuration of masters and slaves, and MapReduce engine being the programming model used to analyze the stored data.', 'Hadoop is introduced as a framework for distributed processing of large data sets on commodity computers using the MapReduce programming model.', 'The chapter emphasizes the companies using Hadoop, including Google, Yahoo, Facebook, Amazon, IBM, LinkedIn, and the Aadhaar scheme, highlighting the widespread adoption of Hadoop in big data analytics.', 'Explanation of a distributed file system, where physically separate machines appear as a single file system at a logical level, to support multiple reads in parallel.']}, {'end': 4595.299, 'segs': [{'end': 4099.046, 'src': 'embed', 'start': 4067.645, 'weight': 3, 'content': [{'end': 4071.087, 'text': "please don't move to HDFS, even if you have a huge amount of data.", 'start': 4067.645, 'duration': 3.442}, {'end': 4078.411, 'text': 'Fine? Lots of small files again because the file information is stored as metadata and if you are.', 'start': 4071.587, 'duration': 6.824}, {'end': 4079.771, 'text': 'Yeah, I am telling you that, Sandeep.', 'start': 4078.411, 'duration': 1.36}, {'end': 4080.872, 'text': 'Just give me once again.', 'start': 4080.032, 'duration': 0.84}, {'end': 4085.714, 'text': 'Because if you store these.', 'start': 4081.973, 'duration': 3.741}, {'end': 4088.256, 'text': 'See, Sandeep is asking me why.', 'start': 4085.714, 'duration': 2.542}, {'end': 4092.92, 'text': "Okay, and I'll answer this question as before we move on out of this.", 'start': 4089.056, 'duration': 3.864}, {'end': 4099.046, 'text': 'Okay, and when I say where HDFS is not a good fit, I also add something called today.', 'start': 4093.481, 'duration': 5.565}], 'summary': 'Avoid moving to hdfs for large amounts of small files due to metadata storage. hdfs not a good fit in certain cases.', 'duration': 31.401, 'max_score': 4067.645, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM4067645.jpg'}, {'end': 4373.99, 'src': 'embed', 'start': 4324.548, 'weight': 0, 'content': [{'end': 4327.372, 'text': 'okay, on this the job tracker runs.', 'start': 4324.548, 'duration': 2.824}, {'end': 4337.5, 'text': 'okay, job tracker is a daemon which runs on the name node and data node is again that commodity hardware on which another daemon called task tracker runs.', 'start': 4327.372, 'duration': 10.128}, {'end': 4343.044, 'text': 'So now I am introducing these two terms in slightly more technical language,', 'start': 4337.96, 'duration': 5.084}, {'end': 4346.706, 'text': 'but going forward in the next couple of slides we will understand more details about it.', 'start': 4343.044, 'duration': 3.662}, {'end': 4356.072, 'text': 'Is that fine with everyone? So major components on HDFS is name nodes and task tracker and job tracker and data node and task tracker.', 'start': 4347.526, 'duration': 8.546}, {'end': 4360.323, 'text': 'okay, Samir is asking what is a demon?', 'start': 4357.662, 'duration': 2.661}, {'end': 4361.624, 'text': 'can anyone answer that question?', 'start': 4360.323, 'duration': 1.301}, {'end': 4362.244, 'text': 'what is a demon?', 'start': 4361.624, 'duration': 0.62}, {'end': 4368.547, 'text': 'quickly, okay, Rahul, Rahul wants to answer that question.', 'start': 4362.244, 'duration': 6.303}, {'end': 4369.548, 'text': 'Rahul is with me.', 'start': 4368.547, 'duration': 1.001}, {'end': 4373.99, 'text': 'Rahul, you want to answer that question?', 'start': 4369.548, 'duration': 4.442}], 'summary': 'Introduction to hdfs components: name node, data node, job tracker, and task tracker. exploring technical language and concepts.', 'duration': 49.442, 'max_score': 4324.548, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM4324548.jpg'}], 'start': 3423.361, 'title': 'Hdfs features and benefits', 'summary': 'Highlights the key features and benefits of hadoop distributed file system (hdfs), including fault tolerance through data replication, high throughput performance, time reduction in data access, suitability for large files, streaming access to file system data, and use of commodity hardware. it also outlines the limitations of hdfs, emphasizing low latency data access, small file management, and data modification.', 'chapters': [{'end': 3510.379, 'start': 3423.361, 'title': 'Understanding hdfs features', 'summary': 'Highlights the key features of hdfs, emphasizing its fault tolerance through data replication on multiple machines and high throughput performance.', 'duration': 87.018, 'highlights': ['HDFS ensures fault tolerance by replicating data on a minimum of three machines, making it highly fault tolerant.', 'The system uses a distributed file system to store data on multiple data nodes to mitigate the risk of nodes becoming faulty over time or at any point.', 'HDFS emphasizes high throughput performance as one of its important features.']}, {'end': 4019.187, 'start': 3510.379, 'title': 'Hadoop distributed file system', 'summary': 'Discusses the benefits of using hadoop distributed file system (hdfs) for large-scale data processing, including the potential time reduction in data access by using parallel processing and the suitability of hdfs for storing and accessing large files. it also emphasizes the significance of streaming access to file system data and the use of commodity hardware for building hdfs.', 'duration': 508.808, 'highlights': ['HDFS reduces time by one-tenth in a ten node cluster and approximately a hundred times in a hundred node cluster when using DFS for data access.', 'HDFS is suitable for large data sets and not suitable for applications with small files, as it is more efficient for storing and accessing large amounts of data.', 'Streaming access to file system data is a key feature of HDFS, making it suitable for operations like write once and read many times, such as analyzing logs.', 'HDFS can be built out of commodity hardware, allowing the construction of high-end storage devices without using high-end servers.', 'The chapter also highlights the importance of HDFS for cases where a huge amount of data is stored in a single file compared to small data spread across multiple files.']}, {'end': 4595.299, 'start': 4020.068, 'title': 'Understanding hdfs: key concepts and limitations', 'summary': 'Outlines the limitations of hdfs, emphasizing low latency data access, small file management, data modification, and introduces the major components of hdfs, including name nodes, data nodes, job tracker, and task tracker.', 'duration': 575.231, 'highlights': ['HDFS limitations: low latency data access, small file management, and data modification', 'Introduction of major components of HDFS: name nodes, data nodes, job tracker, and task tracker', 'Explanation of demon (daemon) in the context of HDFS and comparison with Windows and DOS']}], 'duration': 1171.938, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM3423361.jpg', 'highlights': ['HDFS ensures fault tolerance by replicating data on a minimum of three machines, making it highly fault tolerant.', 'HDFS reduces time by one-tenth in a ten node cluster and approximately a hundred times in a hundred node cluster when using DFS for data access.', 'HDFS emphasizes high throughput performance as one of its important features.', 'HDFS is suitable for large data sets and not suitable for applications with small files, as it is more efficient for storing and accessing large amounts of data.', 'Streaming access to file system data is a key feature of HDFS, making it suitable for operations like write once and read many times, such as analyzing logs.', 'HDFS can be built out of commodity hardware, allowing the construction of high-end storage devices without using high-end servers.', 'The chapter also highlights the importance of HDFS for cases where a huge amount of data is stored in a single file compared to small data spread across multiple files.', 'HDFS limitations: low latency data access, small file management, and data modification', 'Introduction of major components of HDFS: name nodes, data nodes, job tracker, and task tracker', 'Explanation of demon (daemon) in the context of HDFS and comparison with Windows and DOS']}, {'end': 5880.09, 'segs': [{'end': 4736.815, 'src': 'embed', 'start': 4711.575, 'weight': 0, 'content': [{'end': 4718.058, 'text': 'And the idea behind putting this car here is that you need to understand that we have to put a very, very reliable hardware here.', 'start': 4711.575, 'duration': 6.483}, {'end': 4720.16, 'text': 'Double, triple redundancy hardware.', 'start': 4718.539, 'duration': 1.621}, {'end': 4721.961, 'text': 'Okay High reliability machine.', 'start': 4720.7, 'duration': 1.261}, {'end': 4723.681, 'text': 'You might end up putting a raid here.', 'start': 4722.341, 'duration': 1.34}, {'end': 4727.169, 'text': 'Okay So name node is the master of the system.', 'start': 4723.882, 'duration': 3.287}, {'end': 4730.451, 'text': 'It maintains and manages the blocks which are present on the data node.', 'start': 4727.65, 'duration': 2.801}, {'end': 4733.413, 'text': "Okay Please note that it doesn't store any data.", 'start': 4730.912, 'duration': 2.501}, {'end': 4736.815, 'text': 'It maintains and manages the blocks which are present on data nodes.', 'start': 4733.633, 'duration': 3.182}], 'summary': 'Emphasizing highly reliable hardware with double, triple redundancy and a master name node to manage data blocks.', 'duration': 25.24, 'max_score': 4711.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM4711575.jpg'}, {'end': 5290.792, 'src': 'heatmap', 'start': 5029.028, 'weight': 0.76, 'content': [{'end': 5031.449, 'text': 'He says two servers are required for name node, data node.', 'start': 5029.028, 'duration': 2.421}, {'end': 5039.032, 'text': 'Well, ideally in the real world there will be a name node which will be on a high availability hard machine like a Lamborghini,', 'start': 5031.949, 'duration': 7.083}, {'end': 5040.552, 'text': 'and data node will be like ambassadors.', 'start': 5039.032, 'duration': 1.52}, {'end': 5045.394, 'text': 'But if you want to try out something, then a same machine can be used.', 'start': 5041.032, 'duration': 4.362}, {'end': 5048.436, 'text': 'to create one data node and one name node.', 'start': 5045.834, 'duration': 2.602}, {'end': 5049.897, 'text': 'Okay, it can run on the same machine.', 'start': 5048.616, 'duration': 1.281}, {'end': 5053.419, 'text': 'Okay, now Mohan has asked me a question.', 'start': 5050.397, 'duration': 3.022}, {'end': 5054.759, 'text': 'is the name node also commodity??', 'start': 5053.419, 'duration': 1.34}, {'end': 5058.622, 'text': 'No, Mohan, that was the whole point in actually putting this Lamborghini picture here.', 'start': 5055.1, 'duration': 3.522}, {'end': 5064.185, 'text': 'Name node has to be a double triple redundant machine which can be even arrayed.', 'start': 5058.682, 'duration': 5.503}, {'end': 5066.987, 'text': 'Okay, so it cannot be a commodity hardware.', 'start': 5064.665, 'duration': 2.322}, {'end': 5070.467, 'text': 'Because this is the single point of failure in the system.', 'start': 5068.186, 'duration': 2.281}, {'end': 5073.63, 'text': "And when I say single point of failure, I'll explain it.", 'start': 5070.928, 'duration': 2.702}, {'end': 5074.97, 'text': "So don't worry about it right now.", 'start': 5073.67, 'duration': 1.3}, {'end': 5077.152, 'text': "Okay? We'll understand.", 'start': 5075.211, 'duration': 1.941}, {'end': 5085.137, 'text': 'Does that answer your question? Can I move ahead? Okay.', 'start': 5077.372, 'duration': 7.765}, {'end': 5095.884, 'text': 'Great Data node can be redundant.', 'start': 5085.777, 'duration': 10.107}, {'end': 5098.782, 'text': "Okay, now let's go a little deep.", 'start': 5097.295, 'duration': 1.487}, {'end': 5101.011, 'text': "Mohan, we'll answer these questions as we move ahead.", 'start': 5098.902, 'duration': 2.109}, {'end': 5106.105, 'text': "Okay, is that fine? Okay, as we move ahead, we see we've introduced a new term here.", 'start': 5101.072, 'duration': 5.033}, {'end': 5113.111, 'text': 'Can everyone look at this diagram for 10 seconds, take a deep breath and think about what?', 'start': 5106.466, 'duration': 6.645}, {'end': 5117.095, 'text': "Sandeep can think about his wife and I'll think about Katrina Kaif.", 'start': 5113.692, 'duration': 3.403}, {'end': 5123.44, 'text': 'Okay, and look at this and client, job tracker, task tracker, map, reduce.', 'start': 5117.715, 'duration': 5.725}, {'end': 5124.701, 'text': 'What is the new term here?', 'start': 5123.78, 'duration': 0.921}, {'end': 5126.423, 'text': 'A client right?', 'start': 5125.382, 'duration': 1.041}, {'end': 5128.825, 'text': 'Anything else which you see as a new term on this slide?', 'start': 5126.903, 'duration': 1.922}, {'end': 5131.327, 'text': 'Do you understand everything else other than the client?', 'start': 5129.165, 'duration': 2.162}, {'end': 5133.677, 'text': "Everything else we've covered?", 'start': 5132.796, 'duration': 0.881}, {'end': 5134.498, 'text': 'Okay,', 'start': 5134.298, 'duration': 0.2}, {'end': 5135.559, 'text': 'MapReduce, yes.', 'start': 5134.798, 'duration': 0.761}, {'end': 5138.422, 'text': "We've talked about MapReduce earlier as well.", 'start': 5136.64, 'duration': 1.782}, {'end': 5143.527, 'text': "We've said MapReduce is a programming model to retrieve and analyze data.", 'start': 5138.862, 'duration': 4.665}, {'end': 5145.489, 'text': 'Only this much has to be understood today.', 'start': 5143.828, 'duration': 1.661}, {'end': 5147.972, 'text': "Okay It's a simple programming model.", 'start': 5145.91, 'duration': 2.062}, {'end': 5157.636, 'text': 'Okay Is that fine? So other than the client, everything else is clear to everyone? Okay, cool.', 'start': 5149.854, 'duration': 7.782}, {'end': 5165.239, 'text': "So client is actually, client is an application which you will use to, I'm giving the answer.", 'start': 5158.016, 'duration': 7.223}, {'end': 5171.542, 'text': 'A client is an application which you use to interact with both the name node and the data node.', 'start': 5165.6, 'duration': 5.942}, {'end': 5176.084, 'text': 'Okay, which means to interact with job tracker and task tracker.', 'start': 5171.842, 'duration': 4.242}, {'end': 5182.567, 'text': 'Is that fine? Is that fine? As a very simplistic definition, a client is an application which will be running in your machine.', 'start': 5176.184, 'duration': 6.383}, {'end': 5190.06, 'text': 'which you will use to interact or give commands or look at the statuses from a job tracker or task tracker.', 'start': 5184.039, 'duration': 6.021}, {'end': 5190.661, 'text': 'Is that correct??', 'start': 5190.1, 'duration': 0.561}, {'end': 5191.261, 'text': 'Is that okay??', 'start': 5190.741, 'duration': 0.52}, {'end': 5197.222, 'text': 'So interaction between the user and the name node and data node is done through a client, okay?', 'start': 5191.601, 'duration': 5.621}, {'end': 5199.083, 'text': 'And it is called the HDFS client.', 'start': 5197.722, 'duration': 1.361}, {'end': 5204.444, 'text': "Is that fine, everyone? Can you quickly give me a smiley on this? Or else I'll move on.", 'start': 5199.403, 'duration': 5.041}, {'end': 5206.444, 'text': 'Okay Cool.', 'start': 5204.924, 'duration': 1.52}, {'end': 5208.305, 'text': "Okay, let's get.", 'start': 5207.705, 'duration': 0.6}, {'end': 5212.728, 'text': 'somebody lucky question is an answer.', 'start': 5210.923, 'duration': 1.805}, {'end': 5225.435, 'text': "some people get an answer, don't worry, Okay, Devashish, client is a client, is a application or a software which will run on your machine?", 'start': 5212.728, 'duration': 12.707}, {'end': 5227.116, 'text': 'Okay, we will see an example.', 'start': 5225.895, 'duration': 1.221}, {'end': 5230.097, 'text': 'We will show you how a client looks like by the end of this.', 'start': 5227.136, 'duration': 2.961}, {'end': 5238.201, 'text': 'Okay client is an application which runs on your machine, which you will use to interact with data node or a name node,', 'start': 5230.237, 'duration': 7.964}, {'end': 5240.382, 'text': 'which in turn means the job tracker or a task tracker.', 'start': 5238.201, 'duration': 2.181}, {'end': 5243.563, 'text': 'Okay, all your questions will be answered.', 'start': 5240.522, 'duration': 3.041}, {'end': 5248.397, 'text': 'Anand, and Sunday, your question will also be answered.', 'start': 5243.563, 'duration': 4.834}, {'end': 5255.659, 'text': "don't worry, just keep that in your mind.", 'start': 5248.397, 'duration': 7.262}, {'end': 5257.099, 'text': 'just keep these questions in your mind.', 'start': 5255.659, 'duration': 1.44}, {'end': 5260.3, 'text': 'all these questions will be answered.', 'start': 5257.099, 'duration': 3.201}, {'end': 5264.842, 'text': "okay, so let me, let me, let me, just right, I'll answer your question as well.", 'start': 5260.3, 'duration': 4.542}, {'end': 5265.742, 'text': 'just give me one second.', 'start': 5264.842, 'duration': 0.9}, {'end': 5271.704, 'text': "okay, now let's look at this diagram, this this is a little more complex than the earlier diagrams.", 'start': 5265.742, 'duration': 5.962}, {'end': 5274.765, 'text': 'a few more new terms are also introduced in this.', 'start': 5271.704, 'duration': 3.061}, {'end': 5280.344, 'text': "okay, and, and I'm going to explain this, don't worry, you'll,", 'start': 5274.765, 'duration': 5.579}, {'end': 5290.792, 'text': "by the end of by them I end up this light will understand each and every time they're okay, I so we have a name node here which is clear to everyone.", 'start': 5280.344, 'duration': 10.448}], 'summary': 'Two servers needed for name node, data node. name node requires high availability hardware, while data node can run on the same machine. client is an application used to interact with name node and data node.', 'duration': 261.764, 'max_score': 5029.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5029028.jpg'}, {'end': 5176.084, 'src': 'embed', 'start': 5145.91, 'weight': 2, 'content': [{'end': 5147.972, 'text': "Okay It's a simple programming model.", 'start': 5145.91, 'duration': 2.062}, {'end': 5157.636, 'text': 'Okay Is that fine? So other than the client, everything else is clear to everyone? Okay, cool.', 'start': 5149.854, 'duration': 7.782}, {'end': 5165.239, 'text': "So client is actually, client is an application which you will use to, I'm giving the answer.", 'start': 5158.016, 'duration': 7.223}, {'end': 5171.542, 'text': 'A client is an application which you use to interact with both the name node and the data node.', 'start': 5165.6, 'duration': 5.942}, {'end': 5176.084, 'text': 'Okay, which means to interact with job tracker and task tracker.', 'start': 5171.842, 'duration': 4.242}], 'summary': 'Simple programming model for client to interact with name node and data node.', 'duration': 30.174, 'max_score': 5145.91, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5145910.jpg'}, {'end': 5690.338, 'src': 'embed', 'start': 5653.878, 'weight': 6, 'content': [{'end': 5658.801, 'text': "I've not heard from a couple of guys for a long time.", 'start': 5653.878, 'duration': 4.923}, {'end': 5659.961, 'text': 'I want them to answer.', 'start': 5658.801, 'duration': 1.16}, {'end': 5663.063, 'text': 'no, I went.', 'start': 5659.961, 'duration': 3.102}, {'end': 5664.404, 'text': "I haven't heard from British.", 'start': 5663.063, 'duration': 1.341}, {'end': 5675.968, 'text': "I'm asking the default size.", 'start': 5671.505, 'duration': 4.463}, {'end': 5679.991, 'text': 'so I just giving me an answer 8192 bytes, Pawan.', 'start': 5675.968, 'duration': 4.023}, {'end': 5680.892, 'text': "Pawan, you've not.", 'start': 5679.991, 'duration': 0.901}, {'end': 5682.433, 'text': "you've not answered anything.", 'start': 5680.892, 'duration': 1.541}, {'end': 5684.634, 'text': 'friend, can you, can you get quickly?', 'start': 5682.433, 'duration': 2.201}, {'end': 5687.897, 'text': 'google this and give me an answer.', 'start': 5684.634, 'duration': 3.263}, {'end': 5689.638, 'text': 'you can keep this answer for yourself.', 'start': 5687.897, 'duration': 1.741}, {'end': 5690.338, 'text': "I'll tell you.", 'start': 5689.638, 'duration': 0.7}], 'summary': 'Requesting response from missing contacts regarding default size of 8192 bytes.', 'duration': 36.46, 'max_score': 5653.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5653878.jpg'}, {'end': 5747.478, 'src': 'embed', 'start': 5718.938, 'weight': 3, 'content': [{'end': 5720.56, 'text': 'It can be configured to a bigger size.', 'start': 5718.938, 'duration': 1.622}, {'end': 5722.321, 'text': 'The minimum is 64 MB.', 'start': 5720.94, 'duration': 1.381}, {'end': 5724.684, 'text': 'Okay, Does that make sense?', 'start': 5722.762, 'duration': 1.922}, {'end': 5731.311, 'text': "And this clearly tells me if you have smaller file sizes, there will be a lot of metadata which you'll have to store,", 'start': 5725.305, 'duration': 6.006}, {'end': 5733.773, 'text': 'and it will not be a very useful system to use.', 'start': 5731.311, 'duration': 2.462}, {'end': 5737.435, 'text': 'Is that clear to everyone? Usually it will be much more than 64 MB.', 'start': 5734.534, 'duration': 2.901}, {'end': 5741.376, 'text': 'We will be configuring it in a few GBs or maybe sometime in TBs.', 'start': 5737.595, 'duration': 3.781}, {'end': 5742.517, 'text': 'Is that fine?', 'start': 5741.996, 'duration': 0.521}, {'end': 5747.478, 'text': 'Does this answer that question which everyone had, that why do we want to have?', 'start': 5742.637, 'duration': 4.841}], 'summary': 'Configurable storage, minimum 64 mb. typically in gbs or tbs.', 'duration': 28.54, 'max_score': 5718.938, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5718938.jpg'}, {'end': 5833.293, 'src': 'embed', 'start': 5783.572, 'weight': 1, 'content': [{'end': 5786.173, 'text': 'So we have a client which interacts with a name node.', 'start': 5783.572, 'duration': 2.601}, {'end': 5789.495, 'text': 'It also interacts with the data nodes, okay.', 'start': 5786.613, 'duration': 2.882}, {'end': 5791.516, 'text': 'The client does a read and write.', 'start': 5790.195, 'duration': 1.321}, {'end': 5793.937, 'text': "There's something called metadata here.", 'start': 5792.376, 'duration': 1.561}, {'end': 5795.018, 'text': 'So name.', 'start': 5794.177, 'duration': 0.841}, {'end': 5796.538, 'text': 'can anyone tell me what is a metadata??', 'start': 5795.018, 'duration': 1.52}, {'end': 5799.8, 'text': 'Quickly, can someone tell me what is a metadata??', 'start': 5797.639, 'duration': 2.161}, {'end': 5806.103, 'text': 'Okay so, a lot of people have told me the right answer.', 'start': 5803.502, 'duration': 2.601}, {'end': 5808.883, 'text': 'It is a configuration data, data about data, data into data.', 'start': 5806.123, 'duration': 2.76}, {'end': 5809.884, 'text': "Yes, that's true.", 'start': 5809.243, 'duration': 0.641}, {'end': 5816.825, 'text': "So metadata is not the actual data, the actual logs which we'll be storing, but the data about data, which means where are these?", 'start': 5810.184, 'duration': 6.641}, {'end': 5821.987, 'text': "if I'm storing a block name, node will have this information that where is this data getting stored?", 'start': 5816.825, 'duration': 5.162}, {'end': 5831.472, 'text': 'Okay, that information, which are the name nodes, where are the racks there?', 'start': 5823.667, 'duration': 7.805}, {'end': 5833.293, 'text': 'on which rack, which data node is there?', 'start': 5831.472, 'duration': 1.821}], 'summary': 'Client interacts with name and data nodes, performing reads and writes; metadata is configuration data about stored data.', 'duration': 49.721, 'max_score': 5783.572, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5783572.jpg'}], 'start': 4595.319, 'title': 'Hadoop hardware, hdfs client, racks, and block size', 'summary': 'Discusses the importance of reliable hardware in hadoop, emphasizing double and triple redundancy, explains the role of hdfs client in interacting with job tracker and task tracker, describes racks as physical collections of data nodes and the process of data replication, highlights the significance of metadata in hdfs, and emphasizes the need for larger block sizes to minimize metadata and enhance storage efficiency.', 'chapters': [{'end': 5123.44, 'start': 4595.319, 'title': 'Hardware analogy in hadoop', 'summary': 'Discusses the importance of reliable hardware in hadoop, comparing the name node to a lamborghini and the data nodes to ambassadors, emphasizing the need for double and triple redundancy in the hardware.', 'duration': 528.121, 'highlights': ['The name node in Hadoop requires very reliable hardware with double and triple redundancy, similar to a Lamborghini, reflecting its importance as the master of the system.', 'The data nodes in Hadoop are likened to ambassadors, indicating the need for redundant hardware to ensure smooth functioning.', 'The name node is a single point of failure in the system, requiring double and triple redundant hardware and cannot be commodity hardware.']}, {'end': 5274.765, 'start': 5123.78, 'title': 'Understanding hdfs client', 'summary': 'Explains the concept of an hdfs client as an application for interacting with the name node and data node in hadoop, emphasizing its role in interacting with job tracker and task tracker, and highlighting its simplicity as a programming model.', 'duration': 150.985, 'highlights': ['The HDFS client is an application used to interact with both the name node and the data node, which includes interacting with the job tracker and task tracker, serving as a simplistic programming model for data retrieval and analysis.', 'The chapter emphasizes the simplicity of the HDFS client as a programming model for retrieving and analyzing data, ensuring that this understanding is clear among the audience.', 'The instructor reassures the audience that all questions about the HDFS client will be answered and provides an example to illustrate how a client looks like, ensuring that everyone understands the concept.', 'The instructor engages with the audience to address questions and concerns, creating an interactive learning environment and ensuring that all doubts are addressed.']}, {'end': 5566.444, 'start': 5274.765, 'title': 'Understanding racks and data replication', 'summary': 'Explains the concept of racks in a hadoop system, with a rack being a physical collection of data nodes stored at a single location, and the process of data replication to maintain fault tolerance, with a minimum of three replicas required by hdfs.', 'duration': 291.679, 'highlights': ['A rack is a physical collection of data nodes stored at a single location, which can be at different physical locations, and there can be multiple racks in a single location.', 'Data replication is used to maintain fault tolerance in a Hadoop system, with a minimum of three replicas required by HDFS, and can be defined by the user.', 'The block size and operations are discussed in the context of Hadoop, with the recommendation to replicate data on at least three data nodes for fault tolerance.']}, {'end': 5880.09, 'start': 5566.844, 'title': 'Hdfs block size and data storage', 'summary': 'Discusses the importance of data replication and the default block sizes in hdfs and unix, emphasizing the need for larger block sizes to minimize metadata and enhance storage efficiency, highlighting the significance of metadata in hdfs, and explaining the roles of name node, data nodes, and clients in the hdfs architecture.', 'duration': 313.246, 'highlights': ['The default block size in HDFS is 64 MB, while the default block size in Unix is 8192 bytes, emphasizing the significance of larger block sizes to reduce metadata and enhance storage efficiency.', "Metadata in HDFS refers to data about data, such as the location of stored data, block maps, and data node information, emphasizing its storage in the name node's RAM for fast access.", 'The chapter explains the roles of name node, data nodes, and clients in the HDFS architecture, underlining their respective functions in data storage and retrieval.']}], 'duration': 1284.771, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM4595319.jpg', 'highlights': ['The name node in Hadoop requires very reliable hardware with double and triple redundancy, similar to a Lamborghini, reflecting its importance as the master of the system.', 'The data nodes in Hadoop are likened to ambassadors, indicating the need for redundant hardware to ensure smooth functioning.', 'The name node is a single point of failure in the system, requiring double and triple redundant hardware and cannot be commodity hardware.', 'The default block size in HDFS is 64 MB, while the default block size in Unix is 8192 bytes, emphasizing the significance of larger block sizes to reduce metadata and enhance storage efficiency.', "Metadata in HDFS refers to data about data, such as the location of stored data, block maps, and data node information, emphasizing its storage in the name node's RAM for fast access.", 'A rack is a physical collection of data nodes stored at a single location, which can be at different physical locations, and there can be multiple racks in a single location.', 'The HDFS client is an application used to interact with both the name node and the data node, which includes interacting with the job tracker and task tracker, serving as a simplistic programming model for data retrieval and analysis.', 'Data replication is used to maintain fault tolerance in a Hadoop system, with a minimum of three replicas required by HDFS, and can be defined by the user.', 'The block size and operations are discussed in the context of Hadoop, with the recommendation to replicate data on at least three data nodes for fault tolerance.', 'The chapter explains the roles of name node, data nodes, and clients in the HDFS architecture, underlining their respective functions in data storage and retrieval.']}, {'end': 7144.616, 'segs': [{'end': 6056.54, 'src': 'heatmap', 'start': 5923.461, 'weight': 0.789, 'content': [{'end': 5928.825, 'text': "There's something called a secondary name node, and I am introducing a new term here, so please pay attention.", 'start': 5923.461, 'duration': 5.364}, {'end': 5932.948, 'text': "There's something called a secondary name node, which is used to.", 'start': 5929.165, 'duration': 3.783}, {'end': 5942.722, 'text': 'which periodically reads the data from the RAM of the name node and persists that data into a hard disk.', 'start': 5934.376, 'duration': 8.346}, {'end': 5945.163, 'text': 'So name node does not write it into a hard disk.', 'start': 5943.222, 'duration': 1.941}, {'end': 5946.704, 'text': 'Name node keeps the data in the RAM.', 'start': 5945.223, 'duration': 1.481}, {'end': 5951.568, 'text': 'It does not spend time or operations or cycles in writing into the hard disk.', 'start': 5946.824, 'duration': 4.744}, {'end': 5954.049, 'text': "There's something called a secondary name, node,", 'start': 5952.108, 'duration': 1.941}, {'end': 5961.154, 'text': 'which keeps on reading from the name node data from the RAM of the name node and writes into the hard disk.', 'start': 5954.049, 'duration': 7.105}, {'end': 5971.005, 'text': 'Is that clear to everyone? But secondary name node is not a substitute of a name node.', 'start': 5961.454, 'duration': 9.551}, {'end': 5974.687, 'text': 'Okay, so if the name node fails, the system goes down.', 'start': 5971.406, 'duration': 3.281}, {'end': 5976.568, 'text': 'And that is the case in Gen 1 Hadoop.', 'start': 5974.847, 'duration': 1.721}, {'end': 5980.11, 'text': "I'll tell you what happens in the Gen 2 Hadoop.", 'start': 5977.028, 'duration': 3.082}, {'end': 5983.632, 'text': "But for now, let's try to understand the name node as a single point of failure.", 'start': 5980.13, 'duration': 3.502}, {'end': 5987.69, 'text': "There's no backup name node.", 'start': 5984.568, 'duration': 3.122}, {'end': 5995.074, 'text': 'Though it is called a secondary name node, it does not become an active name node if the name node goes down.', 'start': 5988.51, 'duration': 6.564}, {'end': 6003.179, 'text': 'It just reads the metadata from the RAM of the name node periodically and keeps on writing into a file system.', 'start': 5995.514, 'duration': 7.665}, {'end': 6004.5, 'text': 'Is that clear to everyone?', 'start': 6003.539, 'duration': 0.961}, {'end': 6007, 'text': 'it does not come into picture.', 'start': 6005.18, 'duration': 1.82}, {'end': 6012.302, 'text': 'if it keeps a backup of metadata, okay, so it does not.', 'start': 6007, 'duration': 5.302}, {'end': 6015.222, 'text': 'if the name node fails, it does not become the name node.', 'start': 6012.302, 'duration': 2.92}, {'end': 6016.022, 'text': 'is that fine?', 'start': 6015.222, 'duration': 0.8}, {'end': 6021.944, 'text': 'so in gen 1 hadoop, when we talk about gen 1 hadoop, there is just a single point of failure, which is name node.', 'start': 6016.022, 'duration': 5.922}, {'end': 6023.724, 'text': 'if the name node fails, the system crashes.', 'start': 6021.944, 'duration': 1.78}, {'end': 6030.625, 'text': 'okay, and to get the system back you will have to put another name node and read the data back from the secondary name node,', 'start': 6023.724, 'duration': 6.901}, {'end': 6035.047, 'text': 'and you cannot retrieve the data from the time the last backup was taken.', 'start': 6030.625, 'duration': 4.422}, {'end': 6038.988, 'text': 'Is that fine to everyone? That is why we need a Lamborghini for a name node.', 'start': 6035.187, 'duration': 3.801}, {'end': 6045.731, 'text': 'Does that make sense to everyone? Is there anything which is not clear here guys? Okay, good.', 'start': 6039.369, 'duration': 6.362}, {'end': 6050.994, 'text': 'So everyone has understood why we needed a Lamborghini there? Because we do not have any backup to it.', 'start': 6046.112, 'duration': 4.882}, {'end': 6053.395, 'text': 'I send it.', 'start': 6052.594, 'duration': 0.801}, {'end': 6054.317, 'text': 'just give me some time.', 'start': 6053.395, 'duration': 0.922}, {'end': 6055.659, 'text': "I'll explain everything.", 'start': 6054.317, 'duration': 1.342}, {'end': 6056.54, 'text': 'l explain everything.', 'start': 6055.659, 'duration': 0.881}], 'summary': 'Secondary name node periodically reads and writes name node data to hard disk, but does not act as a backup for the name node, leading to single point of failure in gen 1 hadoop.', 'duration': 133.079, 'max_score': 5923.461, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5923461.jpg'}, {'end': 6298.235, 'src': 'embed', 'start': 6264.913, 'weight': 5, 'content': [{'end': 6267.275, 'text': 'yes, there is a secondary name, not injector dope as well.', 'start': 6264.913, 'duration': 2.362}, {'end': 6268.396, 'text': 'it will be there because it.', 'start': 6267.275, 'duration': 1.121}, {'end': 6274.461, 'text': 'we have to keep store this, all this data, to a persistent people to possess all this data, and the hardest, okay.', 'start': 6268.396, 'duration': 6.065}, {'end': 6275.542, 'text': "so let's move forward, guys.", 'start': 6274.461, 'duration': 1.081}, {'end': 6282.168, 'text': 'if everything is clear, head Now, some questions can be answered offline as well, okay? Okay.', 'start': 6275.542, 'duration': 6.626}, {'end': 6286.068, 'text': 'okay, so now, now we have, we have.', 'start': 6283.467, 'duration': 2.601}, {'end': 6291.811, 'text': "we'll get into a little more details and we will look at how does a job tracker and a task tracker work?", 'start': 6286.068, 'duration': 5.743}, {'end': 6298.235, 'text': 'okay, so you can have a look at this diagram and by that we will surely explain it one by one.', 'start': 6291.811, 'duration': 6.424}], 'summary': 'Discussion about data storage and job tracker/task tracker work.', 'duration': 33.322, 'max_score': 6264.913, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6264913.jpg'}, {'end': 6463.811, 'src': 'embed', 'start': 6437.904, 'weight': 1, 'content': [{'end': 6443.806, 'text': "okay, so I'll give you some 20 seconds of time to go through this picture.", 'start': 6437.904, 'duration': 5.902}, {'end': 6451.828, 'text': 'okay, and remember, it has mentioned some steps, something like copy input file, submit job, get input files info.', 'start': 6443.806, 'duration': 8.022}, {'end': 6453.568, 'text': 'then again there is one more submit job.', 'start': 6451.828, 'duration': 1.74}, {'end': 6454.668, 'text': 'there is create splits.', 'start': 6453.568, 'duration': 1.1}, {'end': 6459.23, 'text': 'there is some upload job information, some job.xml, job.jar.', 'start': 6454.668, 'duration': 4.562}, {'end': 6460.45, 'text': 'is there some input files?', 'start': 6459.23, 'duration': 1.22}, {'end': 6461.39, 'text': 'is there?', 'start': 6460.45, 'duration': 0.94}, {'end': 6463.811, 'text': "okay, so just I'm giving you 20 seconds of time.", 'start': 6461.39, 'duration': 2.421}], 'summary': '20 seconds to review picture with steps: copy input file, submit job, create splits, upload job info, job.xml, job.jar', 'duration': 25.907, 'max_score': 6437.904, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6437904.jpg'}, {'end': 6533.921, 'src': 'embed', 'start': 6501.937, 'weight': 4, 'content': [{'end': 6502.897, 'text': 'where is this DFS?', 'start': 6501.937, 'duration': 0.96}, {'end': 6514.43, 'text': 'First of all, when you work on a Hadoop environment, when you work on a clustered environment, The entire HDFS, the Hadoop Distributed File System,', 'start': 6503.437, 'duration': 10.993}, {'end': 6519.312, 'text': 'behaves as one file system, though it is spread across over all the data nodes.', 'start': 6514.43, 'duration': 4.882}, {'end': 6522.394, 'text': 'That is one of the features of a distributed file system.', 'start': 6519.813, 'duration': 2.581}, {'end': 6533.921, 'text': 'Who is a user? A person like you and me, who has some query, who needs some kind of information, who wants to get some kind of data, is a user.', 'start': 6524.435, 'duration': 9.486}], 'summary': 'Hdfs in a clustered environment behaves as one file system, spread across all data nodes.', 'duration': 31.984, 'max_score': 6501.937, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6501937.jpg'}, {'end': 6727.305, 'src': 'embed', 'start': 6702.564, 'weight': 8, 'content': [{'end': 6715.662, 'text': 'Those job information is uploaded By the job tracker, the communication you can see, the client and the job tracker.', 'start': 6702.564, 'duration': 13.098}, {'end': 6724.304, 'text': 'you can assume it or you can understand it in such a way that there is a socket communication, kind of a thing between the user, the name, node,', 'start': 6715.662, 'duration': 8.642}, {'end': 6727.305, 'text': 'the job tracker and the job tracker and the task tracker.', 'start': 6724.304, 'duration': 3.001}], 'summary': 'Job information is uploaded by job tracker, involving socket communication between user, node, job tracker, and task tracker.', 'duration': 24.741, 'max_score': 6702.564, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6702564.jpg'}, {'end': 6804.546, 'src': 'embed', 'start': 6776.165, 'weight': 3, 'content': [{'end': 6779.147, 'text': 'Some kind of information is uploaded on the DFS somewhere.', 'start': 6776.165, 'duration': 2.982}, {'end': 6790.317, 'text': 'and then that particular job list, this particular thing, job queue, this enters a job queue basically which is at the job tracker point.', 'start': 6780.911, 'duration': 9.406}, {'end': 6796.241, 'text': 'Okay, so the client then submits this particular job.', 'start': 6792.118, 'duration': 4.123}, {'end': 6802.805, 'text': 'We will see what happens exactly after this in the next slide, but are these points clear to you?', 'start': 6796.281, 'duration': 6.524}, {'end': 6804.546, 'text': 'all these steps listed out over here?', 'start': 6802.805, 'duration': 1.741}], 'summary': 'Information is uploaded on the dfs, enters a job queue, and client submits the job.', 'duration': 28.381, 'max_score': 6776.165, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6776165.jpg'}, {'end': 6931.15, 'src': 'embed', 'start': 6908.126, 'weight': 0, 'content': [{'end': 6917.636, 'text': 'In traditional DFS environment, I will not go into too much of detail, there was a solution that the data would be present on different nodes.', 'start': 6908.126, 'duration': 9.51}, {'end': 6927.646, 'text': 'There was one master node kind of a configuration and when the data was to be processed, the data was pulled out from that particular machine,', 'start': 6918.357, 'duration': 9.289}, {'end': 6931.15, 'text': 'was taken or torn to that particular master node and then processed.', 'start': 6927.646, 'duration': 3.504}], 'summary': 'Traditional dfs environment involves data present on different nodes with one master node for processing.', 'duration': 23.024, 'max_score': 6908.126, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM6908126.jpg'}, {'end': 7071.743, 'src': 'embed', 'start': 7041.957, 'weight': 9, 'content': [{'end': 7044.698, 'text': 'Then the job tracker creates maps and reduces.', 'start': 7041.957, 'duration': 2.741}, {'end': 7051.319, 'text': 'These are the programs that will be sent to the data node for the processing.', 'start': 7044.798, 'duration': 6.521}, {'end': 7058.26, 'text': 'Now from DFS it has the information where exactly this is stored.', 'start': 7052.419, 'duration': 5.841}, {'end': 7063.081, 'text': 'Now there is something named as input split.', 'start': 7060.861, 'duration': 2.22}, {'end': 7064.562, 'text': "I'll give you an explanation on this.", 'start': 7063.141, 'duration': 1.421}, {'end': 7069.863, 'text': 'And these maps and reduces are sent across to the data nodes.', 'start': 7066.382, 'duration': 3.481}, {'end': 7071.743, 'text': 'Now, what is an input split?', 'start': 7070.283, 'duration': 1.46}], 'summary': 'Job tracker creates maps and reduces sent to data nodes for processing, based on dfs information, including input split.', 'duration': 29.786, 'max_score': 7041.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM7041957.jpg'}], 'start': 5880.51, 'title': 'Hadoop name node and workflow', 'summary': "Discusses the role, structure, and high availability of hadoop's name node, along with the hadoop distributed file system (hdfs) and data processing workflow, important for interview preparation and understanding hadoop's capabilities in clustered environments.", 'chapters': [{'end': 6045.731, 'start': 5880.51, 'title': 'Understanding name node in hadoop', 'summary': 'Discusses the role of the name node in hadoop, including its function as a single point of failure, the role of the secondary name node in storing metadata, and the implications of system crashes.', 'duration': 165.221, 'highlights': ['The name node in Hadoop acts as a single point of failure, causing the system to crash if it fails, which is the case in Gen 1 Hadoop.', 'The secondary name node periodically reads data from the RAM of the name node and writes it into a hard disk, serving as a backup for metadata in case of system failure.', 'In Gen 1 Hadoop, there is no backup name node, and if the name node fails, another name node must be put in place, with data retrieval limited to the time of the last backup.', 'The secondary name node is not a substitute for the name node and does not become an active name node if the name node goes down, emphasizing the need for a reliable name node for system stability.']}, {'end': 6240.284, 'start': 6046.112, 'title': 'Hadoop name node structure', 'summary': "Discusses the importance of hadoop's name node structure, emphasizing the storage of data in ram, the role of the secondary name node in syncing data, and the differences between gen1 and gen2 hadoop in terms of name node structure, crucial for interview preparation.", 'duration': 194.172, 'highlights': ["Hadoop's name node stores all data in its RAM, with a secondary name node responsible for syncing and writing data into a file system.", 'In Gen2 Hadoop, an active-passive name node structure is implemented, with an active name node and a passive name node, providing a standby mechanism in case of the active name node failure.', 'Understanding the differences between Gen1 and Gen2 Hadoop in terms of name node structure is crucial for interview preparation, as it demonstrates the advancements made to address single point of failure.']}, {'end': 6461.39, 'start': 6240.604, 'title': 'Hadoop high availability and job tracker', 'summary': 'Discusses the high availability of name nodes in hadoop, the role of a secondary name node, and the functioning of a job tracker and task tracker, while also introducing an expert in hadoop.', 'duration': 220.786, 'highlights': ['The chapter discusses the high availability of Name nodes in Hadoop, the role of a secondary name node, and the functioning of a job tracker and task tracker.', 'Introduction of an expert in Hadoop, Rahul, who has been working with the speaker in the same team and is known for his hardcore technical expertise.', 'Explanation of the steps involved in the functioning of job tracker and task tracker, including actions like copying input files, submitting jobs, creating splits, and uploading job information.']}, {'end': 6678.913, 'start': 6461.39, 'title': 'Hadoop distributed file system', 'summary': 'Discusses the roles of users, clients, and the hadoop distributed file system (hdfs) in a clustered environment, emphasizing the separation of name node and job tracker in a production environment, and the behavior of hadoop in copying and processing files.', 'duration': 217.523, 'highlights': ['The HDFS behaves as one file system in a clustered environment, although it is spread across all data nodes, emphasizing the distributed nature of the file system.', 'In a production environment, the name node and job tracker are run on separate hosts, underscoring the practical configuration of these components.', "The behavior of Hadoop involves a 'write once, read many times' approach, highlighting the nature of file operations and data access in Hadoop.", 'The user copies input files onto the DFS and submits jobs for analysis, demonstrating the steps involved in querying and processing data in Hadoop.', 'The metadata on the name node contains information about the location of files and data blocks, illustrating the role of the name node in providing essential file information.', 'The process of creating splits involves breaking down large jobs into smaller ones for more efficient processing, resembling the concept of breaking down a large program into smaller functions.']}, {'end': 7144.616, 'start': 6679.253, 'title': 'Hadoop data processing workflow', 'summary': 'Describes the process of job submission, data localization, and map-reduce program execution in hadoop, emphasizing the efficient processing and data handling capability, and the integration of multiple components in the hadoop ecosystem.', 'duration': 465.363, 'highlights': ['Hadoop prefers data localization, saving time and network bandwidth by sending small KB size programs to data nodes for processing, resulting in efficient handling of large data volumes.', 'The job submission process involves job splitting based on some logic or algorithm, with job information being uploaded to the DFS and entering a job queue for further processing.', 'The map-reduce programs in Hadoop process the data by generating key-value pairs, with the reducers working to produce the final output based on these pairs.']}], 'duration': 1264.106, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM5880510.jpg', 'highlights': ['In Gen2 Hadoop, an active-passive name node structure is implemented, providing a standby mechanism.', 'The secondary name node periodically backs up metadata, serving as a backup for system failure.', 'Hadoop prefers data localization, saving time and network bandwidth for efficient processing.', "The behavior of Hadoop involves a 'write once, read many times' approach for file operations.", 'The metadata on the name node contains information about the location of files and data blocks.', 'The process of creating splits involves breaking down large jobs into smaller ones for efficient processing.', 'Understanding the differences between Gen1 and Gen2 Hadoop in terms of name node structure is crucial.', 'The name node in Hadoop acts as a single point of failure, causing the system to crash if it fails.', 'The HDFS behaves as one file system in a clustered environment, emphasizing its distributed nature.', 'Introduction of an expert in Hadoop, Rahul, known for his hardcore technical expertise.']}, {'end': 8411.724, 'segs': [{'end': 7332.818, 'src': 'embed', 'start': 7291.878, 'weight': 1, 'content': [{'end': 7296.442, 'text': 'okay, and onto each data node it has to be processed.', 'start': 7291.878, 'duration': 4.564}, {'end': 7302.567, 'text': 'so for each data node you require a map to process the entire file and then a reducer.', 'start': 7296.442, 'duration': 6.125}, {'end': 7305.93, 'text': 'so number of maps is equal to number of splits.', 'start': 7302.567, 'duration': 3.363}, {'end': 7307.091, 'text': 'does that answer your question?', 'start': 7305.93, 'duration': 1.161}, {'end': 7318.293, 'text': 'Sandeep Anand is asking is map like a pointer??', 'start': 7307.091, 'duration': 11.202}, {'end': 7326.864, 'text': 'No, map is not like a pointer, Mohan.', 'start': 7318.333, 'duration': 8.531}, {'end': 7330.497, 'text': 'the point is, One is asking, if I understand right.', 'start': 7326.864, 'duration': 3.633}, {'end': 7332.818, 'text': 'A job is split into maps.', 'start': 7331.157, 'duration': 1.661}], 'summary': 'Each data node requires a map and reducer, with number of maps equal to number of splits.', 'duration': 40.94, 'max_score': 7291.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM7291878.jpg'}, {'end': 7491.06, 'src': 'embed', 'start': 7418.102, 'weight': 0, 'content': [{'end': 7422.085, 'text': 'that one map program can be distributed to all data nodes.', 'start': 7418.102, 'duration': 3.983}, {'end': 7425.267, 'text': 'but then the maps are distributed.', 'start': 7422.085, 'duration': 3.182}, {'end': 7428.509, 'text': 'okay, try to understand it in this particular way.', 'start': 7425.267, 'duration': 3.242}, {'end': 7430.05, 'text': 'as many maps as split?', 'start': 7428.509, 'duration': 1.541}, {'end': 7434.212, 'text': 'you have written one program, but there are as many maps as splits.', 'start': 7430.05, 'duration': 4.162}, {'end': 7435.193, 'text': 'are there?', 'start': 7434.212, 'duration': 0.981}, {'end': 7437.855, 'text': 'so what exactly is happening on all the data nodes?', 'start': 7435.193, 'duration': 2.662}, {'end': 7439.816, 'text': 'you have to send across the maps.', 'start': 7437.855, 'duration': 1.961}, {'end': 7442.037, 'text': 'so how exactly do you know that?', 'start': 7439.816, 'duration': 2.221}, {'end': 7443.878, 'text': 'how many maps will be created?', 'start': 7442.037, 'duration': 1.841}, {'end': 7446.08, 'text': 'this is based on number of input splits.', 'start': 7443.878, 'duration': 2.202}, {'end': 7449.664, 'text': 'Is that clear to you??', 'start': 7448.803, 'duration': 0.861}, {'end': 7452.447, 'text': 'Did I answer your question, Mohan??', 'start': 7450.985, 'duration': 1.462}, {'end': 7462.797, 'text': 'Okay,', 'start': 7462.537, 'duration': 0.26}, {'end': 7466.001, 'text': 'Fine Okay.', 'start': 7465.18, 'duration': 0.821}, {'end': 7468.023, 'text': 'This slide is clear to all of you now.', 'start': 7466.181, 'duration': 1.842}, {'end': 7487.019, 'text': "Okay, we'll have more clarity on map and reduces as we go down the classes when we really have the map and reduce class.", 'start': 7479.218, 'duration': 7.801}, {'end': 7491.06, 'text': 'Okay, so let us go to the next slide.', 'start': 7487.48, 'duration': 3.58}], 'summary': 'Distribute one map program to all data nodes, creating maps based on number of input splits.', 'duration': 72.958, 'max_score': 7418.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM7418102.jpg'}, {'end': 7637.457, 'src': 'embed', 'start': 7589.586, 'weight': 4, 'content': [{'end': 7597.494, 'text': 'So what it is showing over here is only one blue colored block to give you an explanation that one of the blocks is assigned to the task tracker.', 'start': 7589.586, 'duration': 7.908}, {'end': 7617.984, 'text': 'Okay, so we have a yes from Sandeep is asking why H1 create as a blue.', 'start': 7611.32, 'duration': 6.664}, {'end': 7620.186, 'text': 'I hope I answered that question Sandeep.', 'start': 7618.044, 'duration': 2.142}, {'end': 7631.293, 'text': "Okay, Rohit Malik doesn't a wrapper run to assign task.", 'start': 7627.37, 'duration': 3.923}, {'end': 7637.457, 'text': "Rohit at this point of time, I'll give you a one-liner explanation to it.", 'start': 7632.674, 'duration': 4.783}], 'summary': 'Explanation of assigning one block to task tracker.', 'duration': 47.871, 'max_score': 7589.586, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM7589586.jpg'}, {'end': 8397.161, 'src': 'embed', 'start': 8371.343, 'weight': 6, 'content': [{'end': 8376.046, 'text': 'Okay, how does a read and write happen in HDFS and how is the data actually stored in HDFS.', 'start': 8371.343, 'duration': 4.703}, {'end': 8377.386, 'text': 'So please pay attention.', 'start': 8376.406, 'duration': 0.98}, {'end': 8380.348, 'text': 'It is a very, very, these are some important concepts.', 'start': 8377.627, 'duration': 2.721}, {'end': 8383.129, 'text': 'okay, which you need to understand first.', 'start': 8380.827, 'duration': 2.302}, {'end': 8387.653, 'text': 'okay, so now we know that there is a HDFS client and here is the name node.', 'start': 8383.129, 'duration': 4.524}, {'end': 8395.6, 'text': 'you must appreciate the fact that you have used a color coding scheme to denote a name node and a data node.', 'start': 8387.653, 'duration': 7.947}, {'end': 8397.161, 'text': 'the graphic guy has done a great job.', 'start': 8395.6, 'duration': 1.561}], 'summary': 'Explanation of hdfs read and write process with emphasis on name node and data node.', 'duration': 25.818, 'max_score': 8371.343, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM8371343.jpg'}], 'start': 7146.257, 'title': 'Hadoop core concepts', 'summary': 'Covers map and reduce concepts in hadoop, emphasizing their essential role in data processing, as well as the communication between job tracker and task tracker. it also discusses hdfs concepts, including installation methods and key concepts, with a focus on cloudera cdh4 installation and color coding scheme for name node and data node.', 'chapters': [{'end': 7534.016, 'start': 7146.257, 'title': 'Understanding map and reduce in hadoop', 'summary': 'Discusses the concept of map and reduce in hadoop, emphasizing that they are essential programs written in a language of choice like java, used to process data locally and generate output based on user-defined logic. it also touches upon the role of xml and .jar files, the relationship between the number of maps and input splits, and clarifies the distinction between maps and splits.', 'duration': 387.759, 'highlights': ['Maps and reduces are essential programs written in a language of choice like Java, used to process data locally and generate output based on user-defined logic.', 'Clarifies the distinction between maps and splits, emphasizing that a job is not split into maps, but rather splits are created for the file, and the input split file is divided into blocks and placed on data nodes for processing.', 'Discusses the relationship between the number of maps and input splits, explaining that the number of maps is equal to the number of input splits, as each split requires a map for processing.', 'Provides an overview of the concept of heartbeat, with participants defining it as a stay alive, health check, or communication report to ensure the channels are active.']}, {'end': 8175.069, 'start': 7535.185, 'title': 'Hadoop job tracker and task tracker', 'summary': 'Discusses the communication between the job tracker and task tracker in hadoop, emphasizing the assignment of jobs and the functionality of map and reduce tasks, while also providing a brief example of key-value pairs in mapreduce.', 'duration': 639.884, 'highlights': ['The job tracker assigns jobs to the task tracker based on their status, ensuring efficient task assignment and management.', "The task tracker's functionality includes managing map and reduce tasks, with configurable parameters for map and reduce slots, which can be further explored in later classes.", 'An example of key-value pairs in MapReduce is provided to illustrate the process of counting word occurrences, offering insight into the MapReduce functionality.']}, {'end': 8411.724, 'start': 8175.069, 'title': 'Understanding hdfs concepts', 'summary': 'Discussed the understanding of hdfs concepts, including job tracker and task tracker, installation methods, and important concepts related to hdfs, with a focus on the upcoming hands-on session and the use of cloudera cdh4 installation, as well as the color coding scheme for name node and data node.', 'duration': 236.655, 'highlights': ['The chapter emphasized the upcoming hands-on session and the use of Cloudera CDH4 installation, which can be run on machines with 4GB plus RAM or through a server using a TeamViewer application, with support available from Hesham.', 'It was mentioned that a cloud demo would be shown, and the importance of understanding how a HDFS client creates a new file, as well as the processes of adding a block, reading and writing data in HDFS, and how data is stored in HDFS, was highlighted as important concepts to grasp.', "The discussion touched upon the color coding scheme used to denote a name node and a data node in the graphics, with an appreciative mention of the graphic team's effort in maintaining consistency with the color scheme.", 'The chapter also included interactions around the understanding of job tracker and task tracker, with participants expressing their understanding and raising questions related to Hadoop installation and learning methods.']}], 'duration': 1265.467, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM7146257.jpg', 'highlights': ['Maps and reduces process data locally and generate output based on user-defined logic', 'Job tracker assigns jobs to task tracker based on their status for efficient task management', 'Number of maps is equal to the number of input splits, each split requires a map for processing', 'Concept of heartbeat ensures active communication channels for job and task trackers', 'HDFS client creates a new file, adds a block, and reads/writes data in HDFS', 'Color coding scheme denotes name node and data node in graphics for consistency', 'Cloud demo shown for Cloudera CDH4 installation, support available from Hesham', 'Understanding the process of counting word occurrences in MapReduce functionality']}, {'end': 9177.377, 'segs': [{'end': 8704.833, 'src': 'embed', 'start': 8673.058, 'weight': 0, 'content': [{'end': 8676.721, 'text': 'If anyone had this question in their mind earlier, on what basis will the data?', 'start': 8673.058, 'duration': 3.663}, {'end': 8677.861, 'text': 'will this be decided?', 'start': 8676.721, 'duration': 1.14}, {'end': 8678.822, 'text': 'that which data node to write??', 'start': 8677.861, 'duration': 0.961}, {'end': 8681.808, 'text': 'okay, so let me answer this question.', 'start': 8680.208, 'duration': 1.6}, {'end': 8694.291, 'text': 'so, very, at a very, very high level, it will decide based on the fact that the all the three data nodes should be as as close as possible.', 'start': 8681.808, 'duration': 12.483}, {'end': 8696.051, 'text': 'okay, number one.', 'start': 8694.291, 'duration': 1.76}, {'end': 8704.833, 'text': 'number two the data nodes are decided based on the fact that if there is a fault, then the minimum impact should be there.', 'start': 8696.051, 'duration': 8.782}], 'summary': 'Data nodes are chosen based on proximity and minimal impact in case of faults.', 'duration': 31.775, 'max_score': 8673.058, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM8673058.jpg'}, {'end': 9062.505, 'src': 'embed', 'start': 9036.172, 'weight': 1, 'content': [{'end': 9040.014, 'text': 'for very critical data, you may have want to create more replicas.', 'start': 9036.172, 'duration': 3.842}, {'end': 9044.316, 'text': 'okay, then you may think of writing in a different track, okay, but at the same time, with the three factor,', 'start': 9040.014, 'duration': 4.302}, {'end': 9046.357, 'text': 'no one actually does anything more than this.', 'start': 9044.316, 'duration': 2.041}, {'end': 9049.999, 'text': 'this is what is usually done in the industry and the production environments.', 'start': 9046.357, 'duration': 3.642}, {'end': 9052.16, 'text': 'guys, everyone, everyone clear with this.', 'start': 9049.999, 'duration': 2.161}, {'end': 9053.821, 'text': 'this is an important concept.', 'start': 9052.16, 'duration': 1.661}, {'end': 9056.282, 'text': 'can I get a quick yes or a smiley from everyone?', 'start': 9053.821, 'duration': 2.461}, {'end': 9061.344, 'text': 'great, great, great, Mohan Mohan, is that fine?', 'start': 9056.282, 'duration': 5.062}, {'end': 9062.505, 'text': "or it's kinda fine?", 'start': 9061.344, 'duration': 1.161}], 'summary': 'Creating more replicas for critical data is an industry standard with three-factor replication.', 'duration': 26.333, 'max_score': 9036.172, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9036172.jpg'}], 'start': 8411.744, 'title': 'Hdfs data writing and fault tolerance', 'summary': 'Explains the process of writing data in hdfs, including the roles of name node, client, and data node, default replication factor of 3, and pipeline write process. it also discusses data node selection based on fault tolerance and minimal writing effort, and the process of block replication across multiple racks.', 'chapters': [{'end': 8592.548, 'start': 8411.744, 'title': 'Hdfs data writing process', 'summary': 'Explains the process of writing data in hdfs, including the role of the name node, client, and data node, as well as the default replication factor of 3 and the pipeline write process.', 'duration': 180.804, 'highlights': ['The name node provides information to the client on the nodes available for writing data and where the data should be written.', 'The default replication factor for data in HDFS is 3, which means the data is written to three different data nodes.', 'The name node stores metadata about the data nodes and identifies when a data node is full, enabling efficient data distribution.']}, {'end': 9177.377, 'start': 8592.568, 'title': 'Data node replication and fault tolerance', 'summary': 'Explains the basis for data node selection, discussing the factors of fault tolerance and minimal writing effort, and illustrates the process of block replication across multiple racks to ensure fault tolerance and minimal writing effort.', 'duration': 584.809, 'highlights': ['The chapter explains the basis for data node selection, discussing the factors of fault tolerance and minimal writing effort.', 'Illustrates the process of block replication across multiple racks to ensure fault tolerance and minimal writing effort.']}], 'duration': 765.633, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM8411744.jpg', 'highlights': ['The default replication factor for data in HDFS is 3, which means the data is written to three different data nodes.', 'The name node provides information to the client on the nodes available for writing data and where the data should be written.', 'The name node stores metadata about the data nodes and identifies when a data node is full, enabling efficient data distribution.', 'The chapter explains the basis for data node selection, discussing the factors of fault tolerance and minimal writing effort.', 'Illustrates the process of block replication across multiple racks to ensure fault tolerance and minimal writing effort.']}, {'end': 9956.607, 'segs': [{'end': 9251.437, 'src': 'embed', 'start': 9220.777, 'weight': 2, 'content': [{'end': 9222.997, 'text': 'but the fact is the two things which have to be remembered.', 'start': 9220.777, 'duration': 2.22}, {'end': 9230.279, 'text': 'first, copy on a same rack okay, Mohan told me that, yeah, more space on rack 3, right, Mohan, that is definitely there,', 'start': 9222.997, 'duration': 7.282}, {'end': 9232.319, 'text': 'but we did not tell you what is there on 9, 10, 11, 12.', 'start': 9230.279, 'duration': 2.04}, {'end': 9242.601, 'text': "also right, these data nodes may be having a lot of data and yes, rack 1 and 2 might be there, right, yes, that's also right, Samir.", 'start': 9232.319, 'duration': 10.282}, {'end': 9246.696, 'text': "okay. so let's see what happens with block C.", 'start': 9242.601, 'duration': 4.095}, {'end': 9251.437, 'text': 'So now that everyone has understood, I know where it will happen and what will happen next.', 'start': 9246.696, 'duration': 4.741}], 'summary': 'Discussion about data placement on racks and data nodes.', 'duration': 30.66, 'max_score': 9220.777, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9220777.jpg'}, {'end': 9677.161, 'src': 'embed', 'start': 9647.113, 'weight': 1, 'content': [{'end': 9648.834, 'text': 'Right? Which we will discuss in this session.', 'start': 9647.113, 'duration': 1.721}, {'end': 9655.199, 'text': "Will everyone promise me that? I don't want this to be a lecture kind of scenario.", 'start': 9651.056, 'duration': 4.143}, {'end': 9658.86, 'text': 'Right? Can everyone give me a smiley? Mohan, you can say yes.', 'start': 9655.339, 'duration': 3.521}, {'end': 9661.184, 'text': "You don't want to write.", 'start': 9659.681, 'duration': 1.503}, {'end': 9661.885, 'text': 'Thank you.', 'start': 9661.484, 'duration': 0.401}, {'end': 9665.491, 'text': 'It would be nice if you can give a lot of examples as you go along.', 'start': 9662.105, 'duration': 3.386}, {'end': 9674.16, 'text': 'Sure Like the name value pairs that I asked about.', 'start': 9665.511, 'duration': 8.649}, {'end': 9675.361, 'text': "Don't worry.", 'start': 9674.18, 'duration': 1.181}, {'end': 9677.161, 'text': "I'll tell you one thing.", 'start': 9676.241, 'duration': 0.92}], 'summary': 'Encouraging audience engagement by requesting smiles and examples during the session.', 'duration': 30.048, 'max_score': 9647.113, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9647113.jpg'}, {'end': 9744.564, 'src': 'embed', 'start': 9719.109, 'weight': 0, 'content': [{'end': 9724.353, 'text': 'So something which Mohan was asking and I will also ask this question to you and let me show you how this works.', 'start': 9719.109, 'duration': 5.244}, {'end': 9729.176, 'text': "In this diagram also, don't worry too much about what are these things.", 'start': 9724.773, 'duration': 4.403}, {'end': 9733.299, 'text': "It's very very simple and we'll understand it in a very very simple fashion.", 'start': 9729.276, 'duration': 4.023}, {'end': 9737.742, 'text': "In the next class we'll be seeing all these things happening on a cluster.", 'start': 9733.679, 'duration': 4.063}, {'end': 9740.983, 'text': 'So it will see it happening on a cluster in the real life.', 'start': 9738.602, 'duration': 2.381}, {'end': 9744.564, 'text': "Okay So now what we're going to do is, let me tell you what happens.", 'start': 9741.083, 'duration': 3.481}], 'summary': 'Explanation of a simple diagram and plan to demonstrate on a cluster in the next class.', 'duration': 25.455, 'max_score': 9719.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9719109.jpg'}, {'end': 9964.473, 'src': 'embed', 'start': 9936.155, 'weight': 5, 'content': [{'end': 9939.837, 'text': 'So Sandeep has said that this time you have given a good example.', 'start': 9936.155, 'duration': 3.682}, {'end': 9941.097, 'text': 'Thank you Sandeep.', 'start': 9939.877, 'duration': 1.22}, {'end': 9942.878, 'text': 'I also appreciate you this time only.', 'start': 9941.278, 'duration': 1.6}, {'end': 9944.479, 'text': "Last time I didn't appreciate you.", 'start': 9943.258, 'duration': 1.221}, {'end': 9945.779, 'text': 'Okay Thanks.', 'start': 9944.519, 'duration': 1.26}, {'end': 9948.921, 'text': 'Okay Now Rohit has asked, does.', 'start': 9946.34, 'duration': 2.581}, {'end': 9956.607, 'text': 'Is it a synchronous or asynchronous? So there is no relationship, all the writes are asynchronous.', 'start': 9950.161, 'duration': 6.446}, {'end': 9964.473, 'text': "So you write, you are not writing, it's not that a write happens and then the whole system stops and keeps on waiting for the acknowledgement.", 'start': 9956.967, 'duration': 7.506}], 'summary': 'Discussion on synchronous vs asynchronous writes, emphasizing all writes are asynchronous.', 'duration': 28.318, 'max_score': 9936.155, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9936155.jpg'}], 'start': 9177.378, 'title': 'Hadoop data management', 'summary': 'Covers the allocation of data nodes to racks and emphasizes the strategy of maintaining copies on the same rack to reduce operations. it also explains the hadoop data writing process, including data distribution and reliability measures.', 'chapters': [{'end': 9360.997, 'start': 9177.378, 'title': 'Data nodes and rack allocation', 'summary': 'Discusses the allocation of data nodes to racks, emphasizing the possibility of data ending up on different racks and the strategy of maintaining copies on the same rack to reduce operations, with an emphasis on the discussion between the speaker and participants.', 'duration': 183.619, 'highlights': ['The discussion emphasizes the possibility of data ending up on different racks and the strategy of maintaining copies on the same rack to reduce operations.', 'The speaker addresses the question of whether data is allowed to settle in time after being written to a node, focusing on the importance of ensuring that the data has been written before performing any operations.', 'The discussion involves the speaker and participants, with specific interactions and questions asked by individuals, creating an engaging and interactive learning environment.']}, {'end': 9956.607, 'start': 9360.997, 'title': 'Hadoop data writing process', 'summary': 'Explains the hadoop data writing process, including the operation details, the distribution of data across multiple nodes, and the difference between posted and non-posted write, which ensures data reliability.', 'duration': 595.61, 'highlights': ['Hadoop uses non-posted write for data reliability, waiting for an acknowledgement back from the data node, unlike posted write which does not wait for acknowledgement, ensuring data is settled before assuming it has been written.', 'The process of distributing data across multiple nodes in Hadoop involves giving some lag time or lead time for the machine to load up the full load, ensuring efficient processing and preventing overload.', 'Hadoop ensures data reliability by waiting for an acknowledgement back from the data node, and all the writes are asynchronous, ensuring efficient and reliable data handling.']}], 'duration': 779.229, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9177378.jpg', 'highlights': ['Hadoop ensures data reliability by waiting for an acknowledgement back from the data node, and all the writes are asynchronous, ensuring efficient and reliable data handling.', 'The discussion emphasizes the possibility of data ending up on different racks and the strategy of maintaining copies on the same rack to reduce operations.', 'Hadoop uses non-posted write for data reliability, waiting for an acknowledgement back from the data node, unlike posted write which does not wait for acknowledgement, ensuring data is settled before assuming it has been written.', 'The speaker addresses the question of whether data is allowed to settle in time after being written to a node, focusing on the importance of ensuring that the data has been written before performing any operations.', 'The process of distributing data across multiple nodes in Hadoop involves giving some lag time or lead time for the machine to load up the full load, ensuring efficient processing and preventing overload.', 'The discussion involves the speaker and participants, with specific interactions and questions asked by individuals, creating an engaging and interactive learning environment.']}, {'end': 10845.201, 'segs': [{'end': 10142.703, 'src': 'embed', 'start': 10098.375, 'weight': 0, 'content': [{'end': 10102.517, 'text': "Rahul. if Rahul, the fact is that if I do not get acknowledgement, I'll assume that it has failed.", 'start': 10098.375, 'duration': 4.142}, {'end': 10109.78, 'text': 'Okay, up with this other see the big boxes that if, for if I data not fails in between it,', 'start': 10102.657, 'duration': 7.123}, {'end': 10118.606, 'text': 'then the at the data not that they would get out of the past record which was supposed to write the data on the second data note will go back to that client and tell it that it has failed.', 'start': 10109.78, 'duration': 8.826}, {'end': 10121.528, 'text': 'at last for the best next data note to be written it with.', 'start': 10118.606, 'duration': 2.922}, {'end': 10124.85, 'text': 'the client will give that information and it will write it on the next data note.', 'start': 10121.528, 'duration': 3.322}, {'end': 10127.892, 'text': 'okay, if the second data not also fail, it goes back to the client.', 'start': 10124.85, 'duration': 3.042}, {'end': 10132.155, 'text': 'the client will ask this information, will be able to call that the data note that failed.', 'start': 10127.892, 'duration': 4.263}, {'end': 10132.816, 'text': 'is that clear?', 'start': 10132.155, 'duration': 0.661}, {'end': 10133.036, 'text': 'is that?', 'start': 10132.816, 'duration': 0.22}, {'end': 10138.5, 'text': 'is that clear to everyone now?', 'start': 10133.456, 'duration': 5.044}, {'end': 10142.703, 'text': "yes, yes, I don't know.", 'start': 10138.5, 'duration': 4.203}], 'summary': 'If data fails, it will be acknowledged as failed and retried, as per the process explained by rahul.', 'duration': 44.328, 'max_score': 10098.375, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM10098375.jpg'}, {'end': 10268.794, 'src': 'embed', 'start': 10208.774, 'weight': 6, 'content': [{'end': 10216.459, 'text': 'if this also fails, this guy goes back to here and asks it that which is the one to be written, and this guy does not know it.', 'start': 10208.774, 'duration': 7.685}, {'end': 10218.62, 'text': 'so it will go to the name node, ask which one to be written.', 'start': 10216.459, 'duration': 2.161}, {'end': 10220.281, 'text': 'this will give this information back.', 'start': 10218.62, 'duration': 1.661}, {'end': 10222.243, 'text': 'this information will come back to the task tracker.', 'start': 10220.281, 'duration': 1.962}, {'end': 10224.104, 'text': 'it will write to the last one.', 'start': 10222.243, 'duration': 1.861}, {'end': 10225.625, 'text': 'now, is this flow clear to everyone?', 'start': 10224.104, 'duration': 1.521}, {'end': 10229.007, 'text': 'can I quickly get a smiley or a yes from everyone?', 'start': 10226.465, 'duration': 2.542}, {'end': 10232.891, 'text': 'if there is a confusion, please write back.', 'start': 10229.007, 'duration': 3.884}, {'end': 10234.072, 'text': 'good, good guys.', 'start': 10232.891, 'duration': 1.181}, {'end': 10234.892, 'text': 'thank you very much.', 'start': 10234.072, 'duration': 0.82}, {'end': 10237.655, 'text': 'thank you very much, so I am assuming that everyone is clear with this.', 'start': 10234.892, 'duration': 2.763}, {'end': 10249.024, 'text': 'okay. so everyone now knows what a posted and a non posted right is good.', 'start': 10237.655, 'duration': 11.369}, {'end': 10258.011, 'text': 'so Vidya Sagar is asking a question that will there be an act back for a write on every node, or is it done for every three nodes?', 'start': 10249.024, 'duration': 8.987}, {'end': 10262.852, 'text': 'So with the way to the client, there will be just one act, okay?', 'start': 10258.551, 'duration': 4.301}, {'end': 10265.193, 'text': 'But there is an act, if you see here right?', 'start': 10263.312, 'duration': 1.881}, {'end': 10268.794, 'text': 'This five, this acknowledgement is coming back from this data node.', 'start': 10265.293, 'duration': 3.501}], 'summary': 'Discussion on data writing process and acknowledgements from nodes.', 'duration': 60.02, 'max_score': 10208.774, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM10208774.jpg'}, {'end': 10649.155, 'src': 'embed', 'start': 10605.012, 'weight': 1, 'content': [{'end': 10607.938, 'text': 'we started with what is big data?', 'start': 10605.012, 'duration': 2.926}, {'end': 10610.34, 'text': 'what is the scenarios where big data is used?', 'start': 10607.938, 'duration': 2.402}, {'end': 10613.442, 'text': 'we also spoke about unstructured and structured data.', 'start': 10610.34, 'duration': 3.102}, {'end': 10614.703, 'text': 'everyone is clear with that.', 'start': 10613.442, 'duration': 1.261}, {'end': 10618.026, 'text': 'keep on asking, answering that with a yes or no.', 'start': 10614.703, 'duration': 3.323}, {'end': 10621.049, 'text': 'okay. so structured, unstructured data we talked about.', 'start': 10618.026, 'duration': 3.023}, {'end': 10624.431, 'text': 'then we went on to understand why do we need what?', 'start': 10621.049, 'duration': 3.382}, {'end': 10626.073, 'text': 'is the challenge behind storing data?', 'start': 10624.431, 'duration': 1.642}, {'end': 10628.795, 'text': "it's not the, it's not the storage space, but the IO.", 'start': 10626.073, 'duration': 2.722}, {'end': 10630.376, 'text': 'that was, second point, which we covered.', 'start': 10628.795, 'duration': 1.581}, {'end': 10632.081, 'text': 'right. Right guys?', 'start': 10630.376, 'duration': 1.705}, {'end': 10633.683, 'text': "Don't be frugal in writing.", 'start': 10632.282, 'duration': 1.401}, {'end': 10634.623, 'text': 'yes or no?', 'start': 10633.683, 'duration': 0.94}, {'end': 10640.668, 'text': 'The third thing which we saw was that a distributed file system can be a probable solution.', 'start': 10634.984, 'duration': 5.684}, {'end': 10642.75, 'text': "That's what we saw, correct?", 'start': 10641.669, 'duration': 1.081}, {'end': 10646.993, 'text': 'And then we came to Hadoop and then we saw what is Hadoop all about?', 'start': 10643.61, 'duration': 3.383}, {'end': 10649.155, 'text': 'How does Hadoop solve this problem?', 'start': 10647.753, 'duration': 1.402}], 'summary': "Discussion on big data, structured vs unstructured data, challenges in storing data, distributed file system, and hadoop's role.", 'duration': 44.143, 'max_score': 10605.012, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM10605012.jpg'}, {'end': 10853.707, 'src': 'embed', 'start': 10823.286, 'weight': 9, 'content': [{'end': 10827.548, 'text': "So let's go back to our Facebook example and see if whatever we have learned.", 'start': 10823.286, 'duration': 4.262}, {'end': 10828.349, 'text': 'No problem.', 'start': 10827.909, 'duration': 0.44}, {'end': 10833.552, 'text': "So, no problem Mohan, we'll send you the recording and thank you very much for joining in the class, okay.", 'start': 10828.849, 'duration': 4.703}, {'end': 10839.016, 'text': 'So next we are going to show you a live demo of how a cloud or a VM works,', 'start': 10833.853, 'duration': 5.163}, {'end': 10845.201, 'text': "and what you can quickly do is that after this class go through a drop box link, we'll send you an email with instructions.", 'start': 10839.016, 'duration': 6.185}, {'end': 10846.742, 'text': "you'll have to do that, okay, fine.", 'start': 10845.201, 'duration': 1.541}, {'end': 10853.707, 'text': "So, basically let's go back to our Facebook example now for just one minute and see if we've actually solved the Facebook problem.", 'start': 10847.042, 'duration': 6.665}], 'summary': 'Demonstrate cloud/vm functionality with live demo, followed by email instructions for practical application.', 'duration': 30.421, 'max_score': 10823.286, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM10823286.jpg'}], 'start': 9956.967, 'title': 'Hadoop data processes', 'summary': 'Explains the non-posted write process in hadoop, emphasizing data assumption upon acknowledgement, and the retrieval process in case of failure. it also explores hadoop architecture, highlighting parallel data read, fast data retrieval, and pipeline mode for data write.', 'chapters': [{'end': 10208.774, 'start': 9956.967, 'title': 'Hadoop non-posted write process', 'summary': 'Explains the non-posted write process in hadoop, emphasizing that data is only assumed to be written when an acknowledgement is received, and in case of failure, the client will retrieve information on the next node to write to.', 'duration': 251.807, 'highlights': ['The non-posted write process in Hadoop ensures that data is only assumed to be written when an acknowledgement is received.', 'In case of failure, the client will retrieve information on the next node to write to, ensuring data resilience and continuity.', 'The system allows for continuous data pushing into the pipeline without assuming the write has been done until an acknowledgement is received.', 'The implementation in Hadoop resembles a queue system with job and ack queues, allowing for uninterrupted data pushing and acknowledgement-based write assumption.', 'The process involves the task tracker ensuring data is written to a separate node in case of failure, maintaining data integrity and redundancy.']}, {'end': 10845.201, 'start': 10208.774, 'title': 'Hadoop architecture overview', 'summary': 'Explores hadoop architecture, emphasizing the parallel data read process, the importance of fast data retrieval, and the use of pipeline mode for data write.', 'duration': 636.427, 'highlights': ['The file read happens in parallel to ensure fast data retrieval, even if a data node fails.', 'The pipeline mode is used for data write, while the parallel mode is used for data read, as explained in the class.', 'The importance of fast data retrieval and the architecture of Hadoop to achieve it were emphasized throughout the class.', 'The chapter also introduced the concepts of posted and non-posted write, with an explanation using the analogy of sending mail.', 'The instructor encouraged students to compile a knowledge bank of questions and answers, promising it to be available for their lifetime.']}], 'duration': 888.234, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM9956967.jpg', 'highlights': ['The non-posted write process in Hadoop ensures data assumption upon acknowledgement', 'The file read happens in parallel to ensure fast data retrieval', 'The system allows for continuous data pushing into the pipeline without assuming the write has been done until an acknowledgement is received', 'The implementation in Hadoop resembles a queue system with job and ack queues', 'The process involves the task tracker ensuring data is written to a separate node in case of failure', 'In case of failure, the client will retrieve information on the next node to write to', 'The pipeline mode is used for data write, while the parallel mode is used for data read', 'The importance of fast data retrieval and the architecture of Hadoop to achieve it were emphasized throughout the class', 'The chapter also introduced the concepts of posted and non-posted write, with an explanation using the analogy of sending mail', 'The instructor encouraged students to compile a knowledge bank of questions and answers']}, {'end': 12882.235, 'segs': [{'end': 11159.135, 'src': 'embed', 'start': 11128.103, 'weight': 5, 'content': [{'end': 11129.964, 'text': "It's not for the local file system.", 'start': 11128.103, 'duration': 1.861}, {'end': 11138.558, 'text': 'So when you press enter over here, okay, so i missed out hyphen ls over here.', 'start': 11129.984, 'duration': 8.574}, {'end': 11147.305, 'text': 'so if you go to list it out, okay, sandeep has a question how to install cloud era vm in my laptop.', 'start': 11138.558, 'duration': 8.747}, {'end': 11150.128, 'text': 'just hold on for a second, sandeep.', 'start': 11147.305, 'duration': 2.823}, {'end': 11152.49, 'text': 'okay, press enter.', 'start': 11150.128, 'duration': 2.362}, {'end': 11154.791, 'text': 'you will have this directory structure.', 'start': 11152.49, 'duration': 2.301}, {'end': 11157.894, 'text': 'you can look at the permissions that it specifies.', 'start': 11154.791, 'duration': 3.103}, {'end': 11159.135, 'text': "okay, that's what i told.", 'start': 11157.894, 'duration': 1.241}], 'summary': 'Demonstrating directory structure and permissions with commands, addressing installation query.', 'duration': 31.032, 'max_score': 11128.103, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM11128103.jpg'}, {'end': 12136.978, 'src': 'embed', 'start': 12060.566, 'weight': 1, 'content': [{'end': 12063.148, 'text': 'He has given me a, no, he has not voted.', 'start': 12060.566, 'duration': 2.582}, {'end': 12067.432, 'text': "So how is it adjourned? I'm really sorry, buddy, whoever that is.", 'start': 12064.069, 'duration': 3.363}, {'end': 12072.335, 'text': "I am really sorry if I've taken so much of your time that you cannot bear me for another five minutes.", 'start': 12067.452, 'duration': 4.883}, {'end': 12073.736, 'text': "But I'll still go ahead.", 'start': 12072.876, 'duration': 0.86}, {'end': 12077.122, 'text': 'The good part is whoever has answered has answered it right.', 'start': 12074.721, 'duration': 2.401}, {'end': 12079.583, 'text': 'There is one person who has refrained from voting.', 'start': 12077.422, 'duration': 2.161}, {'end': 12087.606, 'text': 'Who is that? Can you tell me guys? Who is angry with me? Can that person answer? Someone has come up with an answer.', 'start': 12080.343, 'duration': 7.263}, {'end': 12092.348, 'text': 'Well, the answer is Hadoop distributed file system.', 'start': 12088.727, 'duration': 3.621}, {'end': 12095.369, 'text': 'So DFS is distributed file system.', 'start': 12092.788, 'duration': 2.581}, {'end': 12096.95, 'text': 'Okay Okay.', 'start': 12095.649, 'duration': 1.301}, {'end': 12099.771, 'text': 'Whoever got it wrong can quickly confirm that they have understood it now.', 'start': 12096.99, 'duration': 2.781}, {'end': 12104.573, 'text': "Okay So let's come to the next question.", 'start': 12102.592, 'duration': 1.981}, {'end': 12111.983, 'text': 'Sandeep, are you still there??', 'start': 12110.922, 'duration': 1.061}, {'end': 12115.265, 'text': 'Can everyone answer this question quickly?', 'start': 12113.704, 'duration': 1.561}, {'end': 12121.348, 'text': 'I am not going to call out names of the people who got it wrong, because the idea is not to test your knowledge here.', 'start': 12115.285, 'duration': 6.063}, {'end': 12123.39, 'text': 'Great Sandeep, very good.', 'start': 12122.409, 'duration': 0.981}, {'end': 12124.931, 'text': 'With all right answers.', 'start': 12124.01, 'duration': 0.921}, {'end': 12127.852, 'text': 'Who knows you are the guy who did not answer man.', 'start': 12125.591, 'duration': 2.261}, {'end': 12129.493, 'text': 'You are my enemy, I know that.', 'start': 12128.673, 'duration': 0.82}, {'end': 12134.577, 'text': 'okay, okay, so yeah.', 'start': 12132.876, 'duration': 1.701}, {'end': 12136.978, 'text': 'so basically, please, all of you answer.', 'start': 12134.577, 'duration': 2.401}], 'summary': 'Interactive session on hadoop: 1 person refrained from voting, correct answer is hadoop distributed file system.', 'duration': 76.412, 'max_score': 12060.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM12060566.jpg'}, {'end': 12882.235, 'src': 'embed', 'start': 12854.557, 'weight': 0, 'content': [{'end': 12855.618, 'text': 'Thanks, Gaurav.', 'start': 12854.557, 'duration': 1.061}, {'end': 12862.144, 'text': 'Okay So all of you guys can leave.', 'start': 12859.802, 'duration': 2.342}, {'end': 12865.127, 'text': 'Thanks for the long class and bye.', 'start': 12862.264, 'duration': 2.863}, {'end': 12876.993, 'text': 'But Sameer, you were asking something? Yes, we will be doing a kind of project.', 'start': 12871.01, 'duration': 5.983}, {'end': 12881.074, 'text': 'As we go ahead, we will be solving some real life problems.', 'start': 12877.293, 'duration': 3.781}, {'end': 12882.235, 'text': 'Yes, we will be doing that.', 'start': 12881.295, 'duration': 0.94}], 'summary': 'Discussion on real-life project, solving problems.', 'duration': 27.678, 'max_score': 12854.557, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM12854557.jpg'}], 'start': 10845.201, 'title': "Implementing facebook's data analysis pipeline", 'summary': 'Details the implementation of a data analysis pipeline for fast streaming access at facebook, covers basics of using cloudera vm and hadoop commands, explores hadoop basics and installation, includes a q&a and rapid fire session on hadoop, and concludes with a wrap-up involving feedback and future plans.', 'chapters': [{'end': 10946.727, 'start': 10845.201, 'title': "Facebook's data analysis pipeline", 'summary': "Illustrates the implementation of a data analysis pipeline to solve facebook's problem of quickly reading and analyzing data, emphasizing the priority of fast streaming access over writing data, as the presenter engages the audience in understanding and confirming the solution.", 'duration': 101.526, 'highlights': ['The priority for Facebook is to quickly read and analyze data, rather than writing it, with fast streaming access being crucial, emphasizing the implementation of a data analysis pipeline.', 'The presenter engages the audience in understanding and confirming the solution, prompting for confirmation and feedback, creating an interactive and engaging learning environment.', 'The session concludes with a plan for a live demo of Cloud Era, assuring support for understanding and assigning tasks to familiarize with the system during the week.']}, {'end': 11443.012, 'start': 10947.147, 'title': 'Cloudera vm and hadoop commands', 'summary': 'Covers the basics of using cloudera vm and hadoop commands, including how to download the cloudera vm, use hadoop commands, and navigate the cloudera vm interface, with emphasis on practical examples and clear explanations.', 'duration': 495.865, 'highlights': ['The Cloudera VM and Hadoop commands are being introduced, with emphasis on practical examples and clear explanations.', 'Instructions on downloading the Cloudera VM and using Hadoop commands are provided.', 'Navigating the Cloudera VM interface is demonstrated, including a tour of the web interface and running example programs.']}, {'end': 12136.978, 'start': 11443.052, 'title': 'Hadoop basics and installation', 'summary': 'Covers the basics of hadoop, including a look and feel of the mapreduce environment, with a focus on practical aspects and interaction with students through a quiz to ensure understanding of concepts and installation requirements for the next class.', 'duration': 693.926, 'highlights': ['The chapter covers the basics of Hadoop, including a look and feel of the MapReduce environment.', 'Interaction with students through a quiz to ensure understanding of concepts and installation requirements for the next class.', 'Emphasis on practical aspects and installation requirements for the next class.']}, {'end': 12621.14, 'start': 12136.978, 'title': 'Hadoop q&a and rapid fire session', 'summary': 'Covered a q&a and rapid fire session on hadoop, where participants answered questions on java, mapreduce, hadoop file system, and data nodes, with an emphasis on understanding concepts and preparing for interviews.', 'duration': 484.162, 'highlights': ['The session emphasized understanding concepts and preparing for interviews.', 'Participants answered questions on Java, MapReduce, Hadoop file system, and data nodes.', 'The Hadoop file system and data nodes were key topics of discussion.', 'The session included questions that challenged participants and went beyond the class syllabus.']}, {'end': 12882.235, 'start': 12621.52, 'title': 'Interactive online class wrap-up', 'summary': 'Concludes with a call for further self-study, assistance options, and future project plans, following a participative and humorous session involving feedback and support details.', 'duration': 260.715, 'highlights': ['The chapter wraps up with a call for self-study, support options, and future project plans, emphasizing the need for additional learning and assistance to reinforce understanding.', 'Participants are assured of 24/7 support and access to class recordings, with the opportunity for special sessions and assistance via Skype, aiming to address any unclear concepts.', 'A follow-up email will be sent to address queries and provide class recordings, while a Google group will be created for participants to stay connected and a survey to collect feedback for further class improvements.', 'The session involved participant interactions, appreciations, and humor, fostering a participative and engaging environment.', 'The instructor urges participants to engage in self-study and reflection on the topic during the week, highlighting the importance of continuous learning beyond the class.']}], 'duration': 2037.034, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/A02SRdyoshM/pics/A02SRdyoshM10845201.jpg', 'highlights': ['The priority for Facebook is fast streaming access for data analysis', 'The session concludes with a plan for a live demo of Cloud Era', 'The presenter engages the audience in an interactive learning environment', 'The Cloudera VM and Hadoop commands are introduced with practical examples', 'The chapter wraps up with a call for self-study and future project plans', 'The chapter covers the basics of Hadoop and MapReduce environment', 'The session emphasized understanding concepts and preparing for interviews', 'Participants are assured of 24/7 support and access to class recordings', 'The session included questions that challenged participants and went beyond the class syllabus', 'A follow-up email will be sent to address queries and provide class recordings']}], 'highlights': ['The course covers topics such as HDFS, MapReduce, Pig, Hive, Zookeeper, Scoop, HBase, and Java prerequisites for MapReduce programming.', 'The amount of data generated by Facebook is around 500 terabytes per day, making it a significant example of big data.', 'HDFS ensures fault tolerance by replicating data on a minimum of three machines, making it highly fault tolerant.', 'The default replication factor for data in HDFS is 3, which means the data is written to three different data nodes.', 'Hadoop ensures data reliability by waiting for an acknowledgement back from the data node, and all the writes are asynchronous, ensuring efficient and reliable data handling.', 'The challenge of IO speed is emphasized over storage capacity, requiring a distributed file system to handle the large amount of data.', "The session emphasizes the future of IT in analytics and big data, aligning with participants' motivations", 'The need for big data analytics', 'The course emphasizes hands-on practice, with the second week dedicated to setting up a Hadoop cluster and running HDFS commands.', 'The instructor, Abhishek, has over 10 years of experience in the IT world and has been working on Hadoop for a couple of years.']}