Coursnap

title
Hadoop Multi Node Cluster Setup | Hadoop Installation | Hadoop Administration Tutorial | Edureka

description
Check our Hadoop Multi-node Installation blog here: https://goo.gl/dZPg6S This Hadoop tutorial takes you through basics of setting up Hadoop Multi Node Cluster. This tutorial video is specially designed for beginners to learn Hadoop Administration. To attend a live class on Hadoop Administration, click here: http://goo.gl/6xYlp8 This video will help you understand: • Hadoop components and configurations • Modes of a Hadoop Cluster • Hadoop Multi Node cluster • Hands-On Check our Hadoop Multi-node Installation blog here: https://goo.gl/Ju36Yi The topics related to ‘Hadoop Admin’ have been widely covered in our course. PG in Big Data Engineering with NIT Rourkela : https://www.edureka.co/post-graduate/big-data-engineering (450+ Hrs || 9 Months || 20+ Projects & 100+ Case studies) For more information, please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll free).

detail
{'title': 'Hadoop Multi Node Cluster Setup | Hadoop Installation | Hadoop Administration Tutorial | Edureka', 'heatmap': [{'end': 2138.763, 'start': 2083.873, 'weight': 0.707}, {'end': 2283.668, 'start': 2231.502, 'weight': 0.789}, {'end': 2976.986, 'start': 2600.608, 'weight': 0.737}, {'end': 3163.317, 'start': 3112.58, 'weight': 0.729}, {'end': 3255.901, 'start': 3206.675, 'weight': 0.781}], 'summary': 'This tutorial series covers topics such as hadoop cluster setup, implementing recommendation engine with big data, hadoop cluster architecture, best practices, hdfs configuration, property file configuration, and hadoop node setup, catering to a mixed audience with varying levels of familiarity with hadoop.', 'chapters': [{'end': 771.078, 'segs': [{'end': 58.197, 'src': 'embed', 'start': 24.631, 'weight': 4, 'content': [{'end': 31.553, 'text': "If you're very new to it, you might find it a little bit difficult, but just try to follow the steps that I'm doing step by step,", 'start': 24.631, 'duration': 6.922}, {'end': 38.395, 'text': "and you can also go through the recording once this is done and you'll be able to set up your cluster on your own.", 'start': 31.553, 'duration': 6.842}, {'end': 46.737, 'text': "I'm structuring this in such a way that anybody can set up their instances on their own.", 'start': 39.575, 'duration': 7.162}, {'end': 58.197, 'text': 'Now, before going on, I have a quick question, or maybe want to check with you guys how many of you well to set the agenda.', 'start': 48.637, 'duration': 9.56}], 'summary': 'Learn to set up your cluster step by step for easy instance setup.', 'duration': 33.566, 'max_score': 24.631, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo24631.jpg'}, {'end': 125.656, 'src': 'embed', 'start': 93.764, 'weight': 0, 'content': [{'end': 98.648, 'text': 'but in here in Hadoop we will be talking about a physical grouping of servers.', 'start': 93.764, 'duration': 4.884}, {'end': 106.713, 'text': 'So what, as part of this, what I would be doing is, since I had to show the demo on a single machine,', 'start': 99.969, 'duration': 6.744}, {'end': 115.772, 'text': "we would try to replicate a real world scenario on a VMs, where we would be using Oracle's VirtualBox Manager.", 'start': 106.713, 'duration': 9.059}, {'end': 125.656, 'text': "So the same physical server that you would see would be replicated on my instance on my desktop using Oracle's VirtualBox Manager.", 'start': 116.492, 'duration': 9.164}], 'summary': "Hadoop discusses physical server replication on vms using oracle's virtualbox manager.", 'duration': 31.892, 'max_score': 93.764, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo93764.jpg'}, {'end': 214.328, 'src': 'embed', 'start': 188.239, 'weight': 2, 'content': [{'end': 192.94, 'text': 'Yogesh has a question for a small to mid-sized project, how many nodes we have?', 'start': 188.239, 'duration': 4.701}, {'end': 199.178, 'text': 'Well, for a small to mid-sized project, minimum size of a cluster would be finite cluster.', 'start': 193.615, 'duration': 5.563}, {'end': 200.939, 'text': "That's the starting of a startup.", 'start': 199.298, 'duration': 1.641}, {'end': 204.382, 'text': "That's the best way to have a minimum finite cluster to start off.", 'start': 201.28, 'duration': 3.102}, {'end': 210.765, 'text': 'Which Hadoop distribution are you going to use for this session? We would be using open source Apache.', 'start': 205.682, 'duration': 5.083}, {'end': 214.328, 'text': 'The Hadoop version would be 2.71.', 'start': 211.066, 'duration': 3.262}], 'summary': 'For small to mid-sized projects, a minimum finite cluster is recommended, using open source apache hadoop version 2.71.', 'duration': 26.089, 'max_score': 188.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo188239.jpg'}, {'end': 456.349, 'src': 'embed', 'start': 423.844, 'weight': 1, 'content': [{'end': 426.776, 'text': 'Can you share what is the default block size in Hadoop 2.7.', 'start': 423.844, 'duration': 2.932}, {'end': 430.709, 'text': '1? The default block size is 128 MB.', 'start': 426.776, 'duration': 3.933}, {'end': 435.052, 'text': 'In your Hadoop 1, you typically talk about 64 MB.', 'start': 432.21, 'duration': 2.842}, {'end': 438.995, 'text': 'In Hadoop 2, irrespective of version, it is 128 MB.', 'start': 435.572, 'duration': 3.423}, {'end': 441.897, 'text': 'I know enough Hadoop to be dangerous.', 'start': 440.295, 'duration': 1.602}, {'end': 444.098, 'text': "Wonderful, Sandeep, that's great to know.", 'start': 441.997, 'duration': 2.101}, {'end': 447.48, 'text': 'All right.', 'start': 447.12, 'duration': 0.36}, {'end': 449.822, 'text': "so the kind of response I'm getting?", 'start': 447.48, 'duration': 2.342}, {'end': 456.349, 'text': "I think The kind of response I'm getting is I have people who know Hadoop,", 'start': 449.822, 'duration': 6.527}], 'summary': 'Default block size in hadoop 2.7 is 128 mb, increased from 64 mb in hadoop 1.', 'duration': 32.505, 'max_score': 423.844, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo423844.jpg'}, {'end': 700.62, 'src': 'embed', 'start': 675.074, 'weight': 3, 'content': [{'end': 684.076, 'text': "Now to start off with for the guys who are new, whenever you talk about Hadoop or whenever you're learning about Hadoop, look into this in two ways.", 'start': 675.074, 'duration': 9.002}, {'end': 687.437, 'text': 'One is storage, the second one is processing.', 'start': 685.076, 'duration': 2.361}, {'end': 693.478, 'text': 'Always remember your Hadoop is entirely revolved around storage and your processing.', 'start': 688.117, 'duration': 5.361}, {'end': 700.62, 'text': 'That is what Hadoop is, an application which is helping you to store the data and process the data.', 'start': 694.099, 'duration': 6.521}], 'summary': 'Hadoop revolves around storage and processing, essential for data management.', 'duration': 25.546, 'max_score': 675.074, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo675074.jpg'}], 'start': 2.284, 'title': 'Hadoop cluster setup and basics', 'summary': "Covers setting up a three node cluster for hadoop using oracle's virtualbox manager and provides an overview of hadoop, including discussions on setting up clusters, audience demographics, default block size, hadoop components, and real-world use cases, catering to a mixed audience with varying levels of familiarity with hadoop.", 'chapters': [{'end': 141.844, 'start': 2.284, 'title': 'Setting up a multi-node cluster', 'summary': "Covers setting up a three node cluster for hadoop, including a demonstration using oracle's virtualbox manager, with an emphasis on guiding beginners through the process.", 'duration': 139.56, 'highlights': ["The chapter covers setting up a three node cluster for Hadoop, including a demonstration using Oracle's VirtualBox Manager, with an emphasis on guiding beginners through the process.", 'The speaker plans to cover various Hadoop components and their configuration in the next one hour or the next 45 minutes.', 'The session aims to guide beginners in setting up their own Hadoop cluster by following the step-by-step instructions and reviewing the recording.', "The chapter involves replicating a real-world scenario on VMs using Oracle's VirtualBox Manager to demonstrate the setup process."]}, {'end': 771.078, 'start': 142.545, 'title': 'Understanding hadoop basics', 'summary': 'Provides an overview of hadoop, including discussions on setting up clusters, audience demographics, default block size, hadoop components, and real-world use cases, catering to a mixed audience with varying levels of familiarity with hadoop.', 'duration': 628.533, 'highlights': ['The chapter covers discussions on setting up clusters, including a recommendation for a small to mid-sized project to have 15 to 30 nodes, and a clarification on the difference between single node and multi-node setups, catering to audience queries on cluster configurations and sizes.', "The default block size in Hadoop 2.7.1 is 128 MB, which is a key parameter in Hadoop storage, providing precise technical information and insights into Hadoop's storage capabilities.", "The chapter explains the core components of Hadoop 2X, discussing the evolution from Hadoop 1, emphasizing the importance of storage and processing in Hadoop, and relating it to real-world use cases such as e-commerce websites, providing a fundamental understanding of Hadoop's architecture and purpose.", 'The presenter acknowledges a mixed audience with varying levels of familiarity with Hadoop, including individuals who are new to Hadoop, those who have implemented Hadoop, and those who have set up a single node cluster, demonstrating the need to cater to diverse audience needs and knowledge levels.', "The chapter emphasizes the need for audience questions to be relevant to the current topics being discussed, ensuring focused discussions and effective learning for the audience, demonstrating the presenter's commitment to providing targeted and valuable content to the audience."]}], 'duration': 768.794, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2284.jpg', 'highlights': ["The chapter involves replicating a real-world scenario on VMs using Oracle's VirtualBox Manager to demonstrate the setup process.", "The default block size in Hadoop 2.7.1 is 128 MB, which is a key parameter in Hadoop storage, providing precise technical information and insights into Hadoop's storage capabilities.", 'The chapter covers discussions on setting up clusters, including a recommendation for a small to mid-sized project to have 15 to 30 nodes, and a clarification on the difference between single node and multi-node setups, catering to audience queries on cluster configurations and sizes.', "The chapter explains the core components of Hadoop 2X, discussing the evolution from Hadoop 1, emphasizing the importance of storage and processing in Hadoop, and relating it to real-world use cases such as e-commerce websites, providing a fundamental understanding of Hadoop's architecture and purpose.", 'The session aims to guide beginners in setting up their own Hadoop cluster by following the step-by-step instructions and reviewing the recording.']}, {'end': 1426.918, 'segs': [{'end': 803.413, 'src': 'embed', 'start': 771.618, 'weight': 0, 'content': [{'end': 774.56, 'text': "So that's what a recommendation engine is called.", 'start': 771.618, 'duration': 2.942}, {'end': 780.098, 'text': "It's called a recommendation engine, where Amazon implemented for the first time,", 'start': 775.04, 'duration': 5.058}, {'end': 785.662, 'text': 'and they claim that when they implemented the first month they have seen a 30% revenue jump.', 'start': 780.098, 'duration': 5.564}, {'end': 794.487, 'text': "So the reason I'm talking about this is because we need to understand in a real world scenario what are the cases where Hadoop is being implemented.", 'start': 786.822, 'duration': 7.665}, {'end': 803.413, 'text': 'Now how did any e-commerce site comes to that conclusion? Because they have to store the historical data.', 'start': 796.468, 'duration': 6.945}], 'summary': 'Amazon saw a 30% revenue jump after implementing a recommendation engine using hadoop for historical data storage.', 'duration': 31.795, 'max_score': 771.618, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo771618.jpg'}, {'end': 891.564, 'src': 'embed', 'start': 847.473, 'weight': 1, 'content': [{'end': 851.476, 'text': 'If you want to store it for the last five years, there is a data problem.', 'start': 847.473, 'duration': 4.003}, {'end': 854.938, 'text': 'You have huge data there, you have big data there.', 'start': 852.296, 'duration': 2.642}, {'end': 856.579, 'text': 'That is what the big data is.', 'start': 855.378, 'duration': 1.201}, {'end': 861.522, 'text': 'The data for the last five years, people who have browsed your website,', 'start': 856.859, 'duration': 4.663}, {'end': 866.554, 'text': 'people who have purchased a particular item all this will be composed of the big data.', 'start': 861.522, 'duration': 5.032}, {'end': 869.795, 'text': "So it's a huge data for the last five years.", 'start': 867.394, 'duration': 2.401}, {'end': 872.136, 'text': 'Now you need to look at a way to store the data.', 'start': 870.055, 'duration': 2.081}, {'end': 876.138, 'text': "If you go with the traditional systems, they're very expensive.", 'start': 873.097, 'duration': 3.041}, {'end': 885.622, 'text': "Your traditional storage is very expensive and if you're saying that you want to store the data for the last five years, you need to spend a huge,", 'start': 877.398, 'duration': 8.224}, {'end': 891.564, 'text': 'huge amount of money to invest in so many storage servers and then store the data.', 'start': 885.622, 'duration': 5.942}], 'summary': 'Storing big data for the last 5 years is costly and requires a new storage approach.', 'duration': 44.091, 'max_score': 847.473, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo847473.jpg'}, {'end': 945.837, 'src': 'embed', 'start': 915.809, 'weight': 2, 'content': [{'end': 918.35, 'text': 'The value is nothing but the recommendation engine.', 'start': 915.809, 'duration': 2.541}, {'end': 919.891, 'text': 'That is the recommendation.', 'start': 918.79, 'duration': 1.101}, {'end': 924.31, 'text': 'your recommendation is based on the last five years historical data.', 'start': 920.489, 'duration': 3.821}, {'end': 929.692, 'text': "So that's where you can get an accurate prediction or accurate information.", 'start': 925.13, 'duration': 4.562}, {'end': 932.233, 'text': 'The same analogy can be applied.', 'start': 930.672, 'duration': 1.561}, {'end': 934.573, 'text': 'This is just one example of an e-commerce website.', 'start': 932.273, 'duration': 2.3}, {'end': 938.314, 'text': 'You can apply it for your financial side, on your stock trading.', 'start': 934.793, 'duration': 3.521}, {'end': 940.455, 'text': 'On your stock trading.', 'start': 938.875, 'duration': 1.58}, {'end': 945.837, 'text': 'the companies gather huge amount of data, so anytime, whenever you see a news happening,', 'start': 940.455, 'duration': 5.382}], 'summary': 'Recommendation engine uses 5-year historical data for accurate predictions in e-commerce and stock trading.', 'duration': 30.028, 'max_score': 915.809, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo915809.jpg'}, {'end': 1005.967, 'src': 'embed', 'start': 978.469, 'weight': 3, 'content': [{'end': 987.856, 'text': 'they looked for a new solution and they came up with Hadoop as a solution which is providing you storage plus processing.', 'start': 978.469, 'duration': 9.387}, {'end': 992.399, 'text': 'Now the storage is termed as your HDFS.', 'start': 988.757, 'duration': 3.642}, {'end': 998.304, 'text': 'Okay, storage is called as HDFS, and your processing is called as MapReduce.', 'start': 993.42, 'duration': 4.884}, {'end': 1005.967, 'text': 'So the MapReduce higher version is implementation is YARN, Okay, I will explain these acronyms in a short while.', 'start': 999.024, 'duration': 6.943}], 'summary': 'Hadoop offers storage (hdfs) and processing (mapreduce); yarn is the higher version of mapreduce.', 'duration': 27.498, 'max_score': 978.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo978469.jpg'}, {'end': 1152.096, 'src': 'embed', 'start': 1115.956, 'weight': 5, 'content': [{'end': 1118.317, 'text': 'So for HDFS you have a master and a slave.', 'start': 1115.956, 'duration': 2.361}, {'end': 1120.997, 'text': 'For YARN you have a master and a slave.', 'start': 1118.757, 'duration': 2.24}, {'end': 1129.22, 'text': 'Now what is a master component called? The master component for HDFS is called as your name node.', 'start': 1121.838, 'duration': 7.382}, {'end': 1136.882, 'text': 'So as a backup for name node you have something called as a secondary name node which acts as a backup for your name node.', 'start': 1130.26, 'duration': 6.622}, {'end': 1141.123, 'text': 'Then the slave component for your HDFS is called as a data node.', 'start': 1137.702, 'duration': 3.421}, {'end': 1152.096, 'text': 'Similarly, the master component for your yarn is called as a resource manager, and the slave component for your yarn is called as your node manager.', 'start': 1142.248, 'duration': 9.848}], 'summary': 'Hdfs has master (name node) and slave (data node). yarn has master (resource manager) and slave (node manager).', 'duration': 36.14, 'max_score': 1115.956, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1115956.jpg'}, {'end': 1226.174, 'src': 'embed', 'start': 1199.141, 'weight': 4, 'content': [{'end': 1209.086, 'text': 'How can you tell again how YARN is different from MapReduce? Well, see, YARN runs on top of, sorry, MapReduce runs on top of YARN.', 'start': 1199.141, 'duration': 9.945}, {'end': 1213.108, 'text': 'So YARN does cluster management.', 'start': 1210.586, 'duration': 2.522}, {'end': 1218.25, 'text': 'YARN does cluster management, and your MapReduce runs on top of YARN.', 'start': 1213.708, 'duration': 4.542}, {'end': 1226.174, 'text': 'Because when you do a MapReduce processing, you need to have some kind of resources allocation, and that is taken care by your YARN.', 'start': 1218.29, 'duration': 7.884}], 'summary': 'Yarn manages clusters, enabling mapreduce processing with resource allocation.', 'duration': 27.033, 'max_score': 1199.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1199141.jpg'}, {'end': 1335.36, 'src': 'embed', 'start': 1306.497, 'weight': 6, 'content': [{'end': 1309.338, 'text': "So I'm just trying to address some questions, guys.", 'start': 1306.497, 'duration': 2.841}, {'end': 1316.712, 'text': 'Why is data node a slave? Yes, data node is a slave because data node is where all the data is going to reside.', 'start': 1310.569, 'duration': 6.143}, {'end': 1320.493, 'text': 'Which Java collection is best to process data?', 'start': 1317.872, 'duration': 2.621}, {'end': 1325.095, 'text': 'Java 1.6 and above is what is supported.', 'start': 1321.274, 'duration': 3.821}, {'end': 1327.516, 'text': "so right now it's eight.", 'start': 1325.095, 'duration': 2.421}, {'end': 1328.677, 'text': 'JDK eight is available.', 'start': 1327.516, 'duration': 1.161}, {'end': 1330.098, 'text': 'you can use eight also.', 'start': 1328.677, 'duration': 1.421}, {'end': 1335.36, 'text': "What is HBase? HBase is a NoSQL database, it's a Hadoop database.", 'start': 1331.298, 'duration': 4.062}], 'summary': 'Hbase is a nosql database for hadoop, supported by java 1.6 and above, with jdk 8 available.', 'duration': 28.863, 'max_score': 1306.497, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1306497.jpg'}, {'end': 1380.817, 'src': 'embed', 'start': 1359.522, 'weight': 7, 'content': [{'end': 1368.588, 'text': 'Okay, advantage of YARN over MapReduce is if you have data on your Hadoop cluster with MapReduce, you can write only a MapReduce programming.', 'start': 1359.522, 'duration': 9.066}, {'end': 1376.153, 'text': 'With YARN implementation, you implement something called as a capacity scheduler.', 'start': 1369.289, 'duration': 6.864}, {'end': 1377.695, 'text': 'So you need to go a little deeper.', 'start': 1376.194, 'duration': 1.501}, {'end': 1380.817, 'text': "I mean, that's something which you can understand once we go deeper.", 'start': 1377.715, 'duration': 3.102}], 'summary': 'Yarn offers advantages over mapreduce, such as enabling the use of a capacity scheduler for hadoop cluster data processing.', 'duration': 21.295, 'max_score': 1359.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1359522.jpg'}], 'start': 771.618, 'title': 'Implementing recommendation engine with big data', 'summary': "Explains the implementation of a recommendation engine with big data, showcasing how historical data storage and analysis led to a 30% revenue jump for amazon, and discusses hadoop components, including hdfs, mapreduce, yarn, spark, and hbase's role as a nosql database.", 'chapters': [{'end': 953.248, 'start': 771.618, 'title': 'Implementing recommendation engine with big data', 'summary': "Explains the implementation of a recommendation engine with big data, showcasing how historical data storage and analysis can lead to accurate predictions, as seen with amazon's 30% revenue jump upon implementing their recommendation engine.", 'duration': 181.63, 'highlights': ['Amazon saw a 30% revenue jump upon implementing the recommendation engine Amazon claimed a 30% revenue increase in the first month after implementing the recommendation engine.', 'Storing historical data for accurate predictions Storing historical data for the last five years is essential for accurate predictions and recommendations based on customer behavior.', 'Challenges of storing big data Storing large amounts of data for the last five years poses a challenge due to the volume of data, leading to the need for cost-effective storage solutions.', 'Analyzing data for valuable insights Analyzing historical data is crucial for extracting valuable insights, such as the creation of accurate recommendation engines for e-commerce and stock trading.']}, {'end': 1426.918, 'start': 953.708, 'title': 'Understanding hadoop components and architecture', 'summary': 'Explains the core components of hadoop, including hdfs and mapreduce, and their sub-components, along with the advantages of yarn over mapreduce, the integration of spark in a hadoop cluster, and the role of hbase as a nosql database.', 'duration': 473.21, 'highlights': ['Hadoop provides storage and processing solutions, with HDFS for storage and MapReduce (or YARN) for processing. Hadoop provides storage and processing solutions, with HDFS for storage and MapReduce (or YARN) for processing.', 'YARN, or MR version 2, handles resource management, while MapReduce runs on top of YARN for cluster management. YARN, or MR version 2, handles resource management, while MapReduce runs on top of YARN for cluster management.', 'The sub-components of HDFS and YARN follow a master and slave architecture, with name node and data node for HDFS, and resource manager and node manager for YARN. The sub-components of HDFS and YARN follow a master and slave architecture, with name node and data node for HDFS, and resource manager and node manager for YARN.', 'HBase is a NoSQL database that can be integrated into a Hadoop cluster, and Spark can also be integrated into Hadoop for processing. HBase is a NoSQL database that can be integrated into a Hadoop cluster, and Spark can also be integrated into Hadoop for processing.', 'The advantages of YARN over MapReduce include the implementation of a capacity scheduler, which allows for more flexibility in programming. The advantages of YARN over MapReduce include the implementation of a capacity scheduler, which allows for more flexibility in programming.']}], 'duration': 655.3, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo771618.jpg', 'highlights': ['Amazon claimed a 30% revenue increase in the first month after implementing the recommendation engine.', 'Storing historical data for the last five years is essential for accurate predictions and recommendations based on customer behavior.', 'Analyzing historical data is crucial for extracting valuable insights, such as the creation of accurate recommendation engines for e-commerce and stock trading.', 'Hadoop provides storage and processing solutions, with HDFS for storage and MapReduce (or YARN) for processing.', 'YARN, or MR version 2, handles resource management, while MapReduce runs on top of YARN for cluster management.', 'The sub-components of HDFS and YARN follow a master and slave architecture, with name node and data node for HDFS, and resource manager and node manager for YARN.', 'HBase is a NoSQL database that can be integrated into a Hadoop cluster, and Spark can also be integrated into Hadoop for processing.', 'The advantages of YARN over MapReduce include the implementation of a capacity scheduler, which allows for more flexibility in programming.', 'Storing large amounts of data for the last five years poses a challenge due to the volume of data, leading to the need for cost-effective storage solutions.']}, {'end': 1833.837, 'segs': [{'end': 1463.353, 'src': 'embed', 'start': 1427.478, 'weight': 0, 'content': [{'end': 1430.739, 'text': 'So, once you hit your capacity, you need to discard the disk.', 'start': 1427.478, 'duration': 3.261}, {'end': 1432.18, 'text': 'go for a new disk right?', 'start': 1430.739, 'duration': 1.441}, {'end': 1439.644, 'text': "But you don't have that problem with your Hadoop, where you can expand your size of cluster to the extent that you want.", 'start': 1432.24, 'duration': 7.404}, {'end': 1443.725, 'text': 'Perfect guys, I think everybody answered.', 'start': 1441.684, 'duration': 2.041}, {'end': 1449.407, 'text': 'process a daemon is a process which keeps on running, a process which is up and running all the time.', 'start': 1443.725, 'duration': 5.682}, {'end': 1449.807, 'text': "that's right.", 'start': 1449.407, 'duration': 0.4}, {'end': 1463.353, 'text': "So the reason I asked you what a daemon is because, so I have another question, daemons are like name node, that's right, Manjinder.", 'start': 1450.668, 'duration': 12.685}], 'summary': 'Hadoop allows for unlimited cluster expansion, unlike disk capacity limits.', 'duration': 35.875, 'max_score': 1427.478, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1427478.jpg'}, {'end': 1507.45, 'src': 'embed', 'start': 1482.374, 'weight': 1, 'content': [{'end': 1488.277, 'text': 'So for our practical purpose, what we would be doing right now is we will be trying to have three nodes.', 'start': 1482.374, 'duration': 5.903}, {'end': 1496.3, 'text': 'I mean three or four nodes, but typically I will have secondary name node running on the same machine as my name node.', 'start': 1488.677, 'duration': 7.623}, {'end': 1498.827, 'text': "and I'll have two data nodes.", 'start': 1497.387, 'duration': 1.44}, {'end': 1502.829, 'text': 'So one master node and two slave nodes.', 'start': 1499.308, 'duration': 3.521}, {'end': 1504.869, 'text': 'The same thing will be applicable.', 'start': 1503.329, 'duration': 1.54}, {'end': 1507.45, 'text': 'This is what is your HDFS is.', 'start': 1504.929, 'duration': 2.521}], 'summary': 'Setting up hdfs with one master node and two slave nodes for practical purposes.', 'duration': 25.076, 'max_score': 1482.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1482374.jpg'}, {'end': 1579.807, 'src': 'embed', 'start': 1535.107, 'weight': 2, 'content': [{'end': 1540.99, 'text': 'But typical cluster use case, if you look at here, you need to have excellent RAM for name node.', 'start': 1535.107, 'duration': 5.883}, {'end': 1543.472, 'text': 'Hard disk is not that important.', 'start': 1541.891, 'duration': 1.581}, {'end': 1545.453, 'text': 'Need to have multi-cores.', 'start': 1543.932, 'duration': 1.521}, {'end': 1548.474, 'text': 'Make sure you have multiple Ethernet interfaces.', 'start': 1546.113, 'duration': 2.361}, {'end': 1551.516, 'text': 'Go for a 64-bit operating system.', 'start': 1549.775, 'duration': 1.741}, {'end': 1554.455, 'text': 'and make sure there is a redundant power supply.', 'start': 1552.054, 'duration': 2.401}, {'end': 1557.457, 'text': 'I mean that is common with a typical data center, right?', 'start': 1554.755, 'duration': 2.702}, {'end': 1563.199, 'text': "So you'll have two paths where, in case one power supply is gone, you'll have a second power supply available.", 'start': 1557.497, 'duration': 5.702}, {'end': 1568.102, 'text': "So that's on the name node, and you'll have an identical secondary name node.", 'start': 1564.08, 'duration': 4.022}, {'end': 1573.904, 'text': 'When you are implementing a secondary name node, you are saying that this is a backup for your name node,', 'start': 1568.622, 'duration': 5.282}, {'end': 1579.807, 'text': 'so the configuration of your secondary name node should be exactly identical to your name node.', 'start': 1573.904, 'duration': 5.903}], 'summary': 'For a typical cluster, prioritize ram, multi-cores, multiple ethernet interfaces, 64-bit os, and redundant power supply with secondary name node for backup.', 'duration': 44.7, 'max_score': 1535.107, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1535107.jpg'}, {'end': 1630.43, 'src': 'embed', 'start': 1604.585, 'weight': 4, 'content': [{'end': 1615.987, 'text': 'then your cores, okay, Xenon with two cores, and make sure there are multiple Ethernets, multiple Ethernet interfaces, and also go for a 64-bit OS.', 'start': 1604.585, 'duration': 11.402}, {'end': 1628.95, 'text': "One other caution is always make sure you have a homogeneous software across all the machines so that you can have it's easier to debug or it's easier to maintain as an administrator.", 'start': 1617.428, 'duration': 11.522}, {'end': 1630.43, 'text': 'Do not have heterogeneous.', 'start': 1629.13, 'duration': 1.3}], 'summary': 'Choose xenon with two cores, multiple ethernets, and 64-bit os for homogeneous software across all machines.', 'duration': 25.845, 'max_score': 1604.585, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1604585.jpg'}], 'start': 1427.478, 'title': 'Hadoop cluster architecture and best practices', 'summary': 'Covers hadoop cluster architecture, detailing daemons, hardware requirements, and provides best practices for setting up a hadoop cluster, highlighting disk storage, cpu cores, ethernet interfaces, and homogeneous software with emphasis on ram, node configuration, and automatic failover.', 'chapters': [{'end': 1579.807, 'start': 1427.478, 'title': 'Hadoop cluster architecture', 'summary': 'Explains the concept of daemons in hadoop, the architecture of a typical hadoop cluster, and the hardware requirements for a hadoop cluster, including the need for excellent ram for the name node, multi-cores, and redundant power supply.', 'duration': 152.329, 'highlights': ['The Hadoop architecture consists of daemons like name node, secondary name node, and data node, which can be expanded to form a scalable cluster. The concept of daemons in Hadoop allows for the scalability of the cluster, enabling the addition of nodes as needed.', 'The typical Hadoop cluster architecture includes a master node, which contains the name node and the resource manager, and slave nodes, which contain the data nodes and node managers. The structure of a typical Hadoop cluster is explained, highlighting the arrangement of master and slave nodes.', 'Hardware requirements for a Hadoop cluster include the need for excellent RAM for the name node, multi-cores, multiple Ethernet interfaces, a 64-bit operating system, and redundant power supply. The hardware requirements for a Hadoop cluster are outlined, emphasizing the importance of RAM, multi-cores, and redundant power supply.']}, {'end': 1833.837, 'start': 1582.076, 'title': 'Setting up hadoop cluster best practices', 'summary': 'Outlines the best practices for setting up a hadoop cluster, emphasizing the importance of disk storage, cpu cores, ethernet interfaces, and homogeneous software, while addressing considerations for ram, node configuration, and automatic failover.', 'duration': 251.761, 'highlights': ['The importance of disk storage in setting up a Hadoop cluster, emphasizing the crucial role of data storage and suggesting the use of multiple disks with large storage capacities. Emphasizes the importance of disk storage over RAM and suggests the use of multiple disks of large storage capacities.', 'The significance of CPU cores and Ethernet interfaces, recommending the use of Xenon with multiple cores and multiple Ethernet interfaces for efficient cluster performance. Recommends the use of Xenon with multiple cores and multiple Ethernet interfaces for efficient cluster performance.', 'The recommendation for a 64-bit OS and the importance of maintaining homogeneous software across all machines for easier debugging and maintenance. Recommends using a 64-bit OS and maintaining homogeneous software for easier debugging and maintenance.', "Addressing the considerations for RAM, node configuration, and automatic failover, emphasizing the need to consider the company's investment, data capacity, I/O bound, CPU bound, and the use of a standby node for automatic failover. Emphasizes the need to consider investment, data capacity, I/O bound, CPU bound, and the use of a standby node for automatic failover.", 'Clarification on the function of secondary name node as a backup and the possibility of having secondary name node and name node on different nodes, while also providing insights on the node configuration process. Provides clarification on the function of secondary name node as a backup and the possibility of configuring secondary name node and name node on different nodes.']}], 'duration': 406.359, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1427478.jpg', 'highlights': ['The concept of daemons in Hadoop allows for the scalability of the cluster, enabling the addition of nodes as needed.', 'The typical Hadoop cluster architecture includes a master node, which contains the name node and the resource manager, and slave nodes, which contain the data nodes and node managers.', 'Hardware requirements for a Hadoop cluster include the need for excellent RAM for the name node, multi-cores, multiple Ethernet interfaces, a 64-bit operating system, and redundant power supply.', 'The importance of disk storage in setting up a Hadoop cluster, emphasizing the crucial role of data storage and suggesting the use of multiple disks with large storage capacities.', 'The significance of CPU cores and Ethernet interfaces, recommending the use of Xenon with multiple cores and multiple Ethernet interfaces for efficient cluster performance.', 'The recommendation for a 64-bit OS and the importance of maintaining homogeneous software across all machines for easier debugging and maintenance.', 'Addressing the considerations for RAM, node configuration, and automatic failover, emphasizing the need to consider investment, data capacity, I/O bound, CPU bound, and the use of a standby node for automatic failover.', 'Clarification on the function of secondary name node as a backup and the possibility of having secondary name node and name node on different nodes, while also providing insights on the node configuration process.']}, {'end': 2308.672, 'segs': [{'end': 1901.38, 'src': 'embed', 'start': 1858.759, 'weight': 0, 'content': [{'end': 1863.324, 'text': 'So you need to store it onto the name, it gets loaded onto the RAM.', 'start': 1858.759, 'duration': 4.565}, {'end': 1865.426, 'text': "So that's why RAM is very important.", 'start': 1863.444, 'duration': 1.982}, {'end': 1868.222, 'text': 'To the second question.', 'start': 1866.601, 'duration': 1.621}, {'end': 1870.824, 'text': 'yes, there are a lot of tools available with Hadoop,', 'start': 1868.222, 'duration': 2.602}, {'end': 1879.211, 'text': 'where you can upload a file and you can query your file system and try to figure out on which node that particular block is going and residing.', 'start': 1870.824, 'duration': 8.387}, {'end': 1884.575, 'text': "Swapnil's question is automatic failure causes main node to get failed?", 'start': 1881.112, 'duration': 3.463}, {'end': 1886.677, 'text': 'No, automatic failure is where.', 'start': 1884.675, 'duration': 2.002}, {'end': 1894.102, 'text': 'if your name node crashes for some reason with this secondary name node, you need to have manual intervention and bring it up.', 'start': 1886.677, 'duration': 7.425}, {'end': 1901.38, 'text': 'With your standby node, if the name node crashes, the standby node will be automatic failover.', 'start': 1895.778, 'duration': 5.602}], 'summary': 'Ram is crucial for storing data; hadoop offers tools for file querying and automatic failover with standby node.', 'duration': 42.621, 'max_score': 1858.759, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1858759.jpg'}, {'end': 2025.539, 'src': 'embed', 'start': 1982.27, 'weight': 2, 'content': [{'end': 1987.735, 'text': 'So for steps for creating a Hadoop multi-node cluster, first thing is Hadoop is a Java-based framework.', 'start': 1982.27, 'duration': 5.465}, {'end': 1990.097, 'text': 'Need to make sure your Java is installed.', 'start': 1988.235, 'duration': 1.862}, {'end': 1994.601, 'text': 'Need to make sure your Java 1.6 and above is installed.', 'start': 1991.458, 'duration': 3.143}, {'end': 1998.444, 'text': 'Or JDK, your JDK needs to be above six.', 'start': 1994.881, 'duration': 3.563}, {'end': 2002.708, 'text': 'And Hadoop, need to download Hadoop package.', 'start': 2000.085, 'duration': 2.623}, {'end': 2009.814, 'text': 'Now the next point to talk about is specify the IP address of each machine followed by their host names in host files.', 'start': 2003.668, 'duration': 6.146}, {'end': 2017.313, 'text': 'So this is required because the nodes will be, there will be a lot of communication happening between the nodes.', 'start': 2011.428, 'duration': 5.885}, {'end': 2020.475, 'text': 'The data node will be talking to the name node, name node to the data node.', 'start': 2017.493, 'duration': 2.982}, {'end': 2025.539, 'text': 'A lot of intercommunication keeps on happening every now and then, very, very often.', 'start': 2020.495, 'duration': 5.044}], 'summary': 'To create a hadoop multi-node cluster, ensure java 1.6+ is installed, download hadoop package, and specify ip addresses and host names in host files for intercommunication.', 'duration': 43.269, 'max_score': 1982.27, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1982270.jpg'}, {'end': 2142.084, 'src': 'heatmap', 'start': 2083.873, 'weight': 3, 'content': [{'end': 2092.44, 'text': 'Okay, typically I do a lot of demos, so NN2 is where I do a standby name node where we show HA demo, okay, where we talk about HA.', 'start': 2083.873, 'duration': 8.567}, {'end': 2098.825, 'text': 'So I have started only NN1, then I have DN1, DN2, and DN3.', 'start': 2092.76, 'duration': 6.065}, {'end': 2104.629, 'text': "So NN1 is my name node, okay? So where I'll have, look at this here.", 'start': 2099.365, 'duration': 5.264}, {'end': 2109.893, 'text': "So what I'm building a cluster is master daemons will run on my name node.", 'start': 2105.009, 'duration': 4.884}, {'end': 2117.434, 'text': 'So can you guys quickly tell me what are the master daemons? So I have a HDFS and I have my yarn.', 'start': 2110.313, 'duration': 7.121}, {'end': 2122.656, 'text': 'So what are the master daemons for HDFS? The master daemon for HDFS is name node.', 'start': 2117.714, 'duration': 4.942}, {'end': 2130.92, 'text': 'And the master daemon for my yarn is resource manager.', 'start': 2124.457, 'duration': 6.463}, {'end': 2135.722, 'text': 'And my slave daemons will be running on my DN1 and DN2.', 'start': 2131.64, 'duration': 4.082}, {'end': 2138.763, 'text': 'Now also here, I have my slave daemons.', 'start': 2136.382, 'duration': 2.381}, {'end': 2142.084, 'text': 'So HDFS, my slave daemon is called as a data node.', 'start': 2138.943, 'duration': 3.141}], 'summary': 'The demo showcases ha using nn1, dn1, dn2, and dn3, with hdfs having a name node as the master daemon and yarn using a resource manager as the master daemon.', 'duration': 31.771, 'max_score': 2083.873, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2083873.jpg'}, {'end': 2283.668, 'src': 'heatmap', 'start': 2231.502, 'weight': 0.789, 'content': [{'end': 2240.467, 'text': 'Now, what I did was, I also had a DNS setup, so I can easily ping, ping my DN1.', 'start': 2231.502, 'duration': 8.965}, {'end': 2245.93, 'text': 'Your ping should work, and your NS lookup should also work.', 'start': 2241.648, 'duration': 4.282}, {'end': 2248.374, 'text': 'Okay, your NS lookup should also work.', 'start': 2246.874, 'duration': 1.5}, {'end': 2251.515, 'text': 'So if I say NS lookup to my DN1, that should be working.', 'start': 2248.455, 'duration': 3.06}, {'end': 2257.497, 'text': 'So it should come back with my lookup, okay? So you need to have your forward zone and your reverse lookup zone.', 'start': 2251.856, 'duration': 5.641}, {'end': 2262.379, 'text': 'So if I give up my NS lookup of IP address, it should be coming back with a resolution.', 'start': 2257.657, 'duration': 4.722}, {'end': 2272.642, 'text': 'So if you do not have a DNS setup, the alternate way is you can go inside etc host file and make the entries in here.', 'start': 2263.399, 'duration': 9.243}, {'end': 2274.361, 'text': 'Okay, etc.', 'start': 2273.72, 'duration': 0.641}, {'end': 2274.881, 'text': 'host file.', 'start': 2274.361, 'duration': 0.52}, {'end': 2283.668, 'text': 'make the entries in here, where your local host will be pointing into local host on that machine, and the rest of the mapping should be done here.', 'start': 2274.881, 'duration': 8.787}], 'summary': 'Setting up dns for ping and ns lookup, including forward and reverse zones for resolution.', 'duration': 52.166, 'max_score': 2231.502, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2231502.jpg'}], 'start': 1834.298, 'title': 'Hadoop cluster setup', 'summary': 'Covers hdfs namenode storage, ram importance, file storage node identification, and automatic failover. it also discusses setting up a hadoop multi-node cluster, including configuration files, java and hadoop package requirements, ip address and host name specifications, and the structure of master and slave nodes.', 'chapters': [{'end': 1928.062, 'start': 1834.298, 'title': 'Hdfs namenode storage and failover', 'summary': 'Discusses the storage of file system edit logs in hdfs namenode, the importance of ram, identification of file storage nodes, and the automatic failover mechanism, emphasizing the significance of ram and the process of identifying file storage nodes.', 'duration': 93.764, 'highlights': ["The metadata is written to two locations: onto the disk and onto the NameNode's RAM, emphasizing the importance of RAM in HDFS storage.", 'Explanation of tools available with Hadoop to upload and query files to identify the node where a particular block is stored, providing insights into file storage in a cluster.', 'Clarification on the automatic failover mechanism in the event of a NameNode crash and the process of manual intervention and bringing up the secondary NameNode, highlighting the failover process in Hadoop.', 'The possibility of configuring one machine in Ubuntu and two machines in Raspberry Pi for Hadoop, providing flexibility in Hadoop cluster configuration.', 'Importance of having two data nodes in Hadoop, addressing the necessity of multiple data nodes in a Hadoop cluster for redundancy and fault tolerance.']}, {'end': 2308.672, 'start': 1928.062, 'title': 'Hadoop multi-node cluster setup', 'summary': 'Discusses the process of setting up a hadoop multi-node cluster, including the necessary configuration files, java and hadoop package requirements, ip address and host name specifications, and the structure of master and slave nodes.', 'duration': 380.61, 'highlights': ['Name node capacity issues can lead to inability to upload data into the cluster When the name node cannot accommodate metadata, it hits the capacity of the RAM or metadata RAM, leading to the inability to upload any data into the cluster.', 'Importance of specifying IP addresses and host names for intercommunication between nodes Specifying the IP address of each machine followed by their host names in host files is crucial for enabling intercommunication between nodes, facilitating data node communication with the name node and vice versa.', 'Role of master and slave daemons in the cluster structure The cluster is structured with master daemons (name node for HDFS and resource manager for yarn) running on the name node, and slave daemons (data node for HDFS and node manager for yarn) running on the slave nodes.', 'Importance of Java installation and DNS setup It is important to ensure the installation of Java 1.6 and above, the Java-based framework of Hadoop, and to have a properly configured DNS or ETC host file for the cluster setup.', 'Verification of DNS setup using ping and NS lookup Verifying the DNS setup by ensuring the successful functioning of ping and NS lookup, and establishing forward and reverse lookup zones for the IP addresses and fully qualified domain names.']}], 'duration': 474.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo1834298.jpg', 'highlights': ['Importance of RAM in HDFS storage for metadata and file storage node identification.', 'Automatic failover mechanism in the event of a NameNode crash, and manual intervention process.', 'Importance of specifying IP addresses and host names for intercommunication between nodes.', 'Role of master and slave daemons in the cluster structure.', 'Importance of Java installation and DNS setup for Hadoop cluster.']}, {'end': 2832.22, 'segs': [{'end': 2337.904, 'src': 'embed', 'start': 2310.492, 'weight': 0, 'content': [{'end': 2315.274, 'text': 'So I made sure that my communication is happening between my name node and my data nodes.', 'start': 2310.492, 'duration': 4.782}, {'end': 2318.955, 'text': 'So if I ping DN2, it should be communicating.', 'start': 2315.654, 'duration': 3.301}, {'end': 2320.475, 'text': 'Okay, I should get a response.', 'start': 2319.455, 'duration': 1.02}, {'end': 2322.276, 'text': 'So this is what the first check is.', 'start': 2320.715, 'duration': 1.561}, {'end': 2326.037, 'text': 'And then I verified my, I verified Java is installed.', 'start': 2323.096, 'duration': 2.941}, {'end': 2329.038, 'text': "Okay, that's, I'm not going to show you how to install Java.", 'start': 2326.857, 'duration': 2.181}, {'end': 2332.501, 'text': 'So your Java is already installed on here.', 'start': 2330.639, 'duration': 1.862}, {'end': 2337.904, 'text': 'The next important thing is download your Hadoop.', 'start': 2334.102, 'duration': 3.802}], 'summary': 'Ensured communication between name node and data nodes, verified java installation, and downloaded hadoop.', 'duration': 27.412, 'max_score': 2310.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2310492.jpg'}, {'end': 2416.263, 'src': 'embed', 'start': 2390.943, 'weight': 1, 'content': [{'end': 2399.65, 'text': "Always go for the stable version of the stable folder, because that's where you have a stable version, because others may have some bugs in there,", 'start': 2390.943, 'duration': 8.707}, {'end': 2401.051, 'text': 'which will be fixed in stable.', 'start': 2399.65, 'duration': 1.401}, {'end': 2403.112, 'text': "So this is the version I'm downloading.", 'start': 2401.491, 'duration': 1.621}, {'end': 2405.174, 'text': 'Just click on this, this will download a tarball.', 'start': 2403.212, 'duration': 1.962}, {'end': 2408.477, 'text': 'So the same tarball is what I have here.', 'start': 2405.574, 'duration': 2.903}, {'end': 2412.14, 'text': 'This is the first step in your Hadoop cluster setup.', 'start': 2409.618, 'duration': 2.522}, {'end': 2416.263, 'text': 'Okay, now let me bring this to the top of the screen quickly.', 'start': 2413.12, 'duration': 3.143}], 'summary': 'Always choose stable version for hadoop setup, download tarball as first step.', 'duration': 25.32, 'max_score': 2390.943, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2390943.jpg'}, {'end': 2647.291, 'src': 'embed', 'start': 2619.167, 'weight': 2, 'content': [{'end': 2622.27, 'text': 'So these are the important various configuration files I need to have.', 'start': 2619.167, 'duration': 3.103}, {'end': 2626.353, 'text': 'Now if you open these files, all these files are empty.', 'start': 2623.391, 'duration': 2.962}, {'end': 2629.836, 'text': 'When I say empty, look at this configuration here.', 'start': 2627.394, 'duration': 2.442}, {'end': 2632.678, 'text': "In between configuration, you don't have anything.", 'start': 2630.557, 'duration': 2.121}, {'end': 2636.942, 'text': "So that is what when I say it's an empty file, so you don't have any configuration.", 'start': 2632.839, 'duration': 4.103}, {'end': 2641.346, 'text': 'Now this is what is a standalone mode of installation is.', 'start': 2637.863, 'duration': 3.483}, {'end': 2647.291, 'text': 'Okay, so this installation is called as a standalone mode where I can execute my HDFS commands.', 'start': 2642.026, 'duration': 5.265}], 'summary': 'The configuration files are empty, indicating a standalone mode for hdfs commands.', 'duration': 28.124, 'max_score': 2619.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2619167.jpg'}, {'end': 2778.427, 'src': 'embed', 'start': 2718.556, 'weight': 3, 'content': [{'end': 2724.862, 'text': 'The name of the property is fs.defaultfs where you have an IP address and then a port number.', 'start': 2718.556, 'duration': 6.306}, {'end': 2727.785, 'text': "So I'm defining where my HDFS is running.", 'start': 2725.583, 'duration': 2.202}, {'end': 2733.771, 'text': 'My master HDFS is running and I end the property tag.', 'start': 2728.225, 'duration': 5.546}, {'end': 2740.466, 'text': "So this is the default, I mean minimal configuration is what I'm doing so that I can quickly start my cluster.", 'start': 2734.662, 'duration': 5.804}, {'end': 2742.647, 'text': 'And I save this file.', 'start': 2741.467, 'duration': 1.18}, {'end': 2747.451, 'text': "There's a first step, then my HDFS site XML.", 'start': 2744.128, 'duration': 3.323}, {'end': 2751.373, 'text': "In my HDFS site XML also by default it's empty.", 'start': 2748.391, 'duration': 2.982}, {'end': 2754.856, 'text': "It's empty by default, there is no configuration in there.", 'start': 2752.354, 'duration': 2.502}, {'end': 2758.978, 'text': "Now what I do, I'll go ahead and make these two entries.", 'start': 2755.496, 'duration': 3.482}, {'end': 2766.618, 'text': 'What these entries are, The first entry talks about dfs.namenode.name.dir.', 'start': 2760.159, 'duration': 6.459}, {'end': 2773.963, 'text': 'This indicates where I am running or where I am having my name nodes metadata being stored.', 'start': 2767.438, 'duration': 6.525}, {'end': 2776.505, 'text': 'So I need to give a disk location.', 'start': 2774.564, 'duration': 1.941}, {'end': 2778.427, 'text': 'This is my disk location.', 'start': 2777.166, 'duration': 1.261}], 'summary': 'Configuring hdfs with ip, port, and disk location for name nodes metadata storage.', 'duration': 59.871, 'max_score': 2718.556, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2718556.jpg'}, {'end': 2834.94, 'src': 'embed', 'start': 2806.174, 'weight': 5, 'content': [{'end': 2808.595, 'text': 'So I can simply create that in a single one go.', 'start': 2806.174, 'duration': 2.421}, {'end': 2813.196, 'text': "Okay, mkdir hyphen p, I'll create a name node directory.", 'start': 2808.615, 'duration': 4.581}, {'end': 2817.658, 'text': "Similarly, I'm going to create the data node directory also.", 'start': 2814.397, 'duration': 3.261}, {'end': 2822.118, 'text': 'Okay, once my data node directory is created,', 'start': 2819.497, 'duration': 2.621}, {'end': 2828.239, 'text': 'one other thing you need to remember is you need to make sure the permissions on your data node are 7-5-5..', 'start': 2822.118, 'duration': 6.121}, {'end': 2832.22, 'text': 'Okay, 7-5-5 are the permissions on your data node.', 'start': 2829.199, 'duration': 3.021}, {'end': 2834.94, 'text': 'So your HDFS site XML is done.', 'start': 2833.12, 'duration': 1.82}], 'summary': 'Created name node and data node directories with 755 permissions.', 'duration': 28.766, 'max_score': 2806.174, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2806174.jpg'}], 'start': 2310.492, 'title': 'Setting up hadoop cluster and configuring hdfs in xml files', 'summary': 'Outlines the process of setting up a hadoop cluster including verifying communication between nodes, downloading and installing hadoop, and configuring xml files. it also explains configuring hdfs in xml files, adding properties to core-site and hdfs-site xml, defining property formats, and creating directory structures with specific permissions.', 'chapters': [{'end': 2690.989, 'start': 2310.492, 'title': 'Setting up hadoop cluster', 'summary': 'Outlines the process of setting up a hadoop cluster, including verifying communication between name and data nodes, downloading and installing hadoop, creating soft links, and configuring important xml files, with emphasis on the standalone mode of installation.', 'duration': 380.497, 'highlights': ['Verifying the communication between name and data nodes and ensuring a response when pinging a data node (DN2). Ensuring successful communication between name node and data nodes, such as pinging DN2 and expecting a response.', 'Emphasizing the importance of downloading Hadoop from a stable version and choosing a mirror site, as well as the process of untarring the downloaded tarball and creating soft links for easier access. Highlighting the significance of downloading a stable version of Hadoop, untarring the downloaded tarball, and creating soft links for easier access.', 'Identifying and explaining the location of configuration files, specifically core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml, and noting the empty state of the files in the standalone mode of installation. Detailing the location and importance of configuration files, with particular focus on the standalone mode of installation and the empty state of the files.']}, {'end': 2832.22, 'start': 2691.329, 'title': 'Configuring hdfs in xml files', 'summary': 'Explains the process of configuring hdfs in xml files, including adding properties to the core set xml and hdfs site xml, defining the format of the properties, and creating directory structures with specific permissions.', 'duration': 140.891, 'highlights': ['I need to add a property in the core set XML to define the fs.defaultfs property, indicating the IP address and port number where the master HDFS is running.', 'In the HDFS site XML, I make entries for dfs.namenode.name.dir and dfs.datanode.data.dir to specify the disk locations for storing name nodes metadata and data nodes data respectively.', 'Creating directory structures for name node and data node with specific permissions, ensuring that the data node has permissions set to 7-5-5.']}], 'duration': 521.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2310492.jpg', 'highlights': ['Verifying communication between name and data nodes, such as pinging DN2 and expecting a response.', 'Downloading a stable version of Hadoop, untarring the downloaded tarball, and creating soft links for easier access.', 'Identifying the location and importance of configuration files, particularly in the standalone mode of installation and the empty state of the files.', 'Adding a property in the core-site.xml to define the fs.defaultfs property, indicating the IP address and port number where the master HDFS is running.', 'Making entries in the HDFS site XML for dfs.namenode.name.dir and dfs.datanode.data.dir to specify the disk locations for storing name nodes metadata and data nodes data.', 'Creating directory structures for name node and data node with specific permissions, ensuring that the data node has permissions set to 7-5-5.']}, {'end': 3276.726, 'segs': [{'end': 2896.084, 'src': 'embed', 'start': 2865.707, 'weight': 0, 'content': [{'end': 2870.828, 'text': 'Typically with Hadoop one you will have to define where is your job tracker is running.', 'start': 2865.707, 'duration': 5.121}, {'end': 2874.209, 'text': "So, but here I'm not running any job tracker.", 'start': 2871.168, 'duration': 3.041}, {'end': 2875.229, 'text': "here I'm running.", 'start': 2874.209, 'duration': 1.02}, {'end': 2877.559, 'text': "I'm running yarn.", 'start': 2875.939, 'duration': 1.62}, {'end': 2880.92, 'text': "okay, I'm running on top of yarn, so that is what you're indicating in here.", 'start': 2877.559, 'duration': 3.361}, {'end': 2886.041, 'text': "So you're indicating MapReduce.framework.name, which is your yarn.", 'start': 2881.48, 'duration': 4.561}, {'end': 2888.402, 'text': 'So save this file.', 'start': 2887.482, 'duration': 0.92}, {'end': 2896.084, 'text': 'The next important file is your yarn site XML, where it is also, by default, it is also empty.', 'start': 2889.302, 'duration': 6.782}], 'summary': 'Configuring hadoop with yarn framework instead of job tracker, and setting yarn site xml as empty by default.', 'duration': 30.377, 'max_score': 2865.707, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2865707.jpg'}, {'end': 3002.348, 'src': 'embed', 'start': 2952.129, 'weight': 1, 'content': [{'end': 2956.472, 'text': 'So I will insert my first slave node, then my second slave node.', 'start': 2952.129, 'duration': 4.343}, {'end': 2958.374, 'text': 'My second slave node is my DN2.', 'start': 2956.652, 'duration': 1.722}, {'end': 2963.738, 'text': 'Get the IP address and add the IP here.', 'start': 2960.335, 'duration': 3.403}, {'end': 2966.28, 'text': 'So I save the slaves file here.', 'start': 2964.839, 'duration': 1.441}, {'end': 2970.46, 'text': 'Now one thing to remember is with Hadoop 2 you will not have a masters file.', 'start': 2966.877, 'duration': 3.583}, {'end': 2976.986, 'text': "Only with Hadoop 1 you will have a masters file because that's where you're going to configure your secondary name node.", 'start': 2971.221, 'duration': 5.765}, {'end': 2987.115, 'text': 'But with Hadoop 2 by default typically companies will go for Hadoop 2 is because they want to have a standby node rather than a secondary name node.', 'start': 2977.426, 'duration': 9.689}, {'end': 2992.139, 'text': "So whenever you're implementing a Hadoop 2 implementation you should definitely have a standby node.", 'start': 2987.715, 'duration': 4.424}, {'end': 2996.425, 'text': 'Now these are the configuration files I am done.', 'start': 2994.304, 'duration': 2.121}, {'end': 2999.427, 'text': 'Now I am ready to start my cluster.', 'start': 2997.566, 'duration': 1.861}, {'end': 3002.348, 'text': 'So format the cluster.', 'start': 3001.148, 'duration': 1.2}], 'summary': 'Configured hadoop cluster with 2 slave nodes, emphasizing standby node option in hadoop 2.', 'duration': 50.219, 'max_score': 2952.129, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2952129.jpg'}, {'end': 3163.317, 'src': 'heatmap', 'start': 3112.58, 'weight': 0.729, 'content': [{'end': 3113.821, 'text': 'That is onto my DN1.', 'start': 3112.58, 'duration': 1.241}, {'end': 3118.011, 'text': 'The next one is copy this into my DN2 also.', 'start': 3114.73, 'duration': 3.281}, {'end': 3124.054, 'text': 'Just come to your DN1 and verify the files are there.', 'start': 3121.533, 'duration': 2.521}, {'end': 3129.877, 'text': 'So go inside your Hadoop, inside this etc, inside this Hadoop.', 'start': 3125.015, 'duration': 4.862}, {'end': 3135.339, 'text': 'So you have your XML files, right? So these are time stamped right now.', 'start': 3130.977, 'duration': 4.362}, {'end': 3137.6, 'text': 'So now I know that these are my latest files.', 'start': 3135.579, 'duration': 2.021}, {'end': 3138.821, 'text': 'Okay, just copy it.', 'start': 3138.02, 'duration': 0.801}, {'end': 3141.842, 'text': 'So the same will be copied onto your DN2 also.', 'start': 3139.361, 'duration': 2.481}, {'end': 3152.886, 'text': "Now one important thing I need to do here is, because I'm having my data node, this is my data node, I need to create my data node structure.", 'start': 3143.303, 'duration': 9.583}, {'end': 3155.947, 'text': "I don't have to create the data node structure on my name node.", 'start': 3153.486, 'duration': 2.461}, {'end': 3157.647, 'text': 'I think I created on my name node here.', 'start': 3155.987, 'duration': 1.66}, {'end': 3159.448, 'text': "So we don't need to do it on name node.", 'start': 3157.948, 'duration': 1.5}, {'end': 3163.317, 'text': "If you're doing a pseudo cluster, then yes, you can do it on name node.", 'start': 3160.074, 'duration': 3.243}], 'summary': 'Copying files from dn1 to dn2, verifying timestamps, and discussing data node structure.', 'duration': 50.737, 'max_score': 3112.58, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3112580.jpg'}, {'end': 3255.901, 'src': 'heatmap', 'start': 3206.675, 'weight': 0.781, 'content': [{'end': 3213.837, 'text': 'So the next step is it says edit slave files on masternode, format the name node, and start all services.', 'start': 3206.675, 'duration': 7.162}, {'end': 3219.718, 'text': 'So the first thing you do when you purchase a new hard disk or a new computer is what you do, you format it.', 'start': 3214.757, 'duration': 4.961}, {'end': 3223.016, 'text': "Now that I'm new to Hadoop, I don't know what commands to run.", 'start': 3220.333, 'duration': 2.683}, {'end': 3224.898, 'text': 'I just run HDFS.', 'start': 3223.476, 'duration': 1.422}, {'end': 3228.322, 'text': 'Okay, I just run HDFS.', 'start': 3226.84, 'duration': 1.482}, {'end': 3231.425, 'text': 'This gives me a listing of all the commands I can run here.', 'start': 3228.642, 'duration': 2.783}, {'end': 3234.969, 'text': 'The first one is name node hyphen format.', 'start': 3232.126, 'duration': 2.843}, {'end': 3237.531, 'text': "So I'll leverage the same thing.", 'start': 3236.03, 'duration': 1.501}, {'end': 3242.036, 'text': 'I will say HDFS name node hyphen format.', 'start': 3237.551, 'duration': 4.485}, {'end': 3255.901, 'text': 'So this is where it is formatting your cluster, your name node typically.', 'start': 3252.54, 'duration': 3.361}], 'summary': "To set up hadoop, format name node, and start all services, run 'hdfs name node format'.", 'duration': 49.226, 'max_score': 3206.675, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3206675.jpg'}, {'end': 3255.901, 'src': 'embed', 'start': 3220.333, 'weight': 3, 'content': [{'end': 3223.016, 'text': "Now that I'm new to Hadoop, I don't know what commands to run.", 'start': 3220.333, 'duration': 2.683}, {'end': 3224.898, 'text': 'I just run HDFS.', 'start': 3223.476, 'duration': 1.422}, {'end': 3228.322, 'text': 'Okay, I just run HDFS.', 'start': 3226.84, 'duration': 1.482}, {'end': 3231.425, 'text': 'This gives me a listing of all the commands I can run here.', 'start': 3228.642, 'duration': 2.783}, {'end': 3234.969, 'text': 'The first one is name node hyphen format.', 'start': 3232.126, 'duration': 2.843}, {'end': 3237.531, 'text': "So I'll leverage the same thing.", 'start': 3236.03, 'duration': 1.501}, {'end': 3242.036, 'text': 'I will say HDFS name node hyphen format.', 'start': 3237.551, 'duration': 4.485}, {'end': 3255.901, 'text': 'So this is where it is formatting your cluster, your name node typically.', 'start': 3252.54, 'duration': 3.361}], 'summary': 'New to hadoop, learning hdfs commands, including name node format.', 'duration': 35.568, 'max_score': 3220.333, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3220333.jpg'}], 'start': 2833.12, 'title': 'Configuring hadoop', 'summary': 'Covers configuring hadoop property files for hdfs, mapreduce, and yarn, including specific properties such as mapreduce.framework.name, and details the process of setting up a hadoop 2 cluster with emphasis on key setup steps.', 'chapters': [{'end': 2919.334, 'start': 2833.12, 'title': 'Configuring hadoop property files', 'summary': 'Covers configuring hdfs, mapreduce, and yarn property files, including making entries, adding properties, and indicating the framework used, with specific mention of mapreduce.framework.name and running on top of yarn.', 'duration': 86.214, 'highlights': ['The chapter covers configuring HDFS, MapReduce, and YARN property files, including making entries, adding properties, and indicating the framework used, with specific mention of MapReduce.framework.name and running on top of yarn.', 'One property to add in the maplet set XML is indicating what framework it is going to use, with specific mention of MapReduce.framework.name and yarn.', 'The yarn set XML requires adding properties, specifically mentioning that some of the properties are required.', 'The chapter emphasizes the need to define where the job tracker is running, noting that in this case, it is running on top of yarn.']}, {'end': 3276.726, 'start': 2919.334, 'title': 'Configuring hadoop cluster', 'summary': 'Details the process of configuring a hadoop 2 cluster, including setting up slave nodes, copying configuration files, creating data node structure, and formatting the name node, ultimately emphasizing the importance of these steps in the cluster setup.', 'duration': 357.392, 'highlights': ['Setting up slave nodes and adding them to the slaves file, such as DN1 and DN2, with their respective IP addresses. The initial configuration involves setting up DN1 and DN2 as slave nodes, each with their IP addresses added to the slaves file.', 'Emphasizing the need for a standby node in Hadoop 2 implementation, rather than a secondary name node, and ensuring the slave nodes are configured properly to know the master. In Hadoop 2 implementation, it is crucial to have a standby node and configure the slave nodes to recognize the master node for proper functioning.', 'Copying essential XML configuration files (core site, HDFS site, MapRed site, and yarn site) to the slave nodes and creating the data node structure on the slave nodes. The process involves copying important XML configuration files to the slave nodes and creating the data node structure, ensuring the proper functioning of the slave nodes in the cluster.', "Formatting the name node using the 'HDFS name node -format' command to create inode entries and tables for the cluster. The formatting of the name node involves creating inode entries and tables for the cluster, ensuring the proper setup and functioning of the Hadoop cluster."]}], 'duration': 443.606, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo2833120.jpg', 'highlights': ['The chapter covers configuring HDFS, MapReduce, and YARN property files, including specific properties such as MapReduce.framework.name and running on top of yarn.', 'Setting up slave nodes and adding them to the slaves file, such as DN1 and DN2, with their respective IP addresses.', 'Emphasizing the need for a standby node in Hadoop 2 implementation, rather than a secondary name node, and ensuring the slave nodes are configured properly to know the master.', "Formatting the name node using the 'HDFS name node -format' command to create inode entries and tables for the cluster."]}, {'end': 3733.19, 'segs': [{'end': 3319.879, 'src': 'embed', 'start': 3277.046, 'weight': 0, 'content': [{'end': 3285.991, 'text': 'You remember this is the directory we have given on your HDFS site XML where your dfs.name.dir.', 'start': 3277.046, 'duration': 8.945}, {'end': 3289.214, 'text': 'This is the value to that property.', 'start': 3286.272, 'duration': 2.942}, {'end': 3291.676, 'text': 'So your format is successful.', 'start': 3290.295, 'duration': 1.381}, {'end': 3294.578, 'text': 'The next important thing is start the instances.', 'start': 3292.276, 'duration': 2.302}, {'end': 3298.582, 'text': 'So how do you start the instances? So we go by two starts.', 'start': 3295.119, 'duration': 3.463}, {'end': 3301.604, 'text': 'One is a start your dfs.sh.', 'start': 3298.802, 'duration': 2.802}, {'end': 3319.879, 'text': 'So this is starting your name node.', 'start': 3318.499, 'duration': 1.38}], 'summary': 'Configured directory for dfs.name.dir on hdfs site xml. started instances including name node.', 'duration': 42.833, 'max_score': 3277.046, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3277046.jpg'}, {'end': 3431.037, 'src': 'embed', 'start': 3398.227, 'weight': 1, 'content': [{'end': 3400.889, 'text': 'JPS is what, JPS is a Java process monitor.', 'start': 3398.227, 'duration': 2.662}, {'end': 3406.953, 'text': "Okay, JPS is indicating it's a Java process monitor and try to do a JPS on your DN2.", 'start': 3401.409, 'duration': 5.544}, {'end': 3409.514, 'text': 'So the data nodes have not started here.', 'start': 3407.253, 'duration': 2.261}, {'end': 3411.976, 'text': "So let's quickly look at how to troubleshoot this.", 'start': 3409.634, 'duration': 2.342}, {'end': 3418.551, 'text': "So I have my masternodes up on my name node, but they're not started on my data node.", 'start': 3412.688, 'duration': 5.863}, {'end': 3424.214, 'text': "So what could be the reason here? The reason could be because, let's quickly look at the log files.", 'start': 3418.591, 'duration': 5.623}, {'end': 3431.037, 'text': 'So where do you have the log files? You have the log files on your Hadoop install location.', 'start': 3424.294, 'duration': 6.743}], 'summary': 'Jps is a java process monitor. data nodes not started. troubleshoot with log files in hadoop install location.', 'duration': 32.81, 'max_score': 3398.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3398227.jpg'}, {'end': 3499.664, 'src': 'embed', 'start': 3468.03, 'weight': 2, 'content': [{'end': 3480.801, 'text': 'How can you overcome this? You can go inside home, Edureka, then Hadoop store, HDFS, then your data node two.', 'start': 3468.03, 'duration': 12.771}, {'end': 3484.061, 'text': 'So you already have some information here.', 'start': 3481.941, 'duration': 2.12}, {'end': 3492.183, 'text': 'just remove this metadata and then, now that I need to do some kind of a troubleshooting, start the processes individually.', 'start': 3484.061, 'duration': 8.122}, {'end': 3499.664, 'text': 'So how do I do that individually? Hadoop hyphen daemon dot sh, start my data node process.', 'start': 3492.423, 'duration': 7.241}], 'summary': 'Troubleshoot by starting hadoop data node process individually.', 'duration': 31.634, 'max_score': 3468.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3468030.jpg'}, {'end': 3676.904, 'src': 'embed', 'start': 3609.57, 'weight': 3, 'content': [{'end': 3618.595, 'text': "192.168, so you're accessing your GUI, name node GUI admin on port 50070.", 'start': 3609.57, 'duration': 9.025}, {'end': 3620.296, 'text': 'So guys, I know we are short of time.', 'start': 3618.595, 'duration': 1.701}, {'end': 3621.818, 'text': 'Just give me two minutes.', 'start': 3620.817, 'duration': 1.001}, {'end': 3624.059, 'text': 'We should be able to see the setup here.', 'start': 3622.258, 'duration': 1.801}, {'end': 3628.363, 'text': "Okay, so this is what we'll show you.", 'start': 3626.181, 'duration': 2.182}, {'end': 3636.81, 'text': 'In effect, the cluster setup in total configured capacity is my 36 GB, and I have my two live nodes.', 'start': 3628.383, 'duration': 8.427}, {'end': 3639.952, 'text': 'Okay, two data nodes which are configured in here.', 'start': 3637.45, 'duration': 2.502}, {'end': 3645.396, 'text': 'Either I can go from here and see what are all the nodes, my DN1 and my DN2.', 'start': 3639.972, 'duration': 5.424}, {'end': 3647.138, 'text': 'So two nodes.', 'start': 3646.377, 'duration': 0.761}, {'end': 3649.121, 'text': 'set up for my cluster.', 'start': 3648.16, 'duration': 0.961}, {'end': 3655.387, 'text': 'So this is how you build a multi-node cluster and start working on this.', 'start': 3650.142, 'duration': 5.245}, {'end': 3664.417, 'text': 'Typically you can also run a Hadoop DFS admin report.', 'start': 3661.534, 'duration': 2.883}, {'end': 3669.462, 'text': 'You can do a HDFS DFS admin hyphen report.', 'start': 3665.177, 'duration': 4.285}, {'end': 3676.904, 'text': 'which will give you the same message or the same details as your NameNode user interface.', 'start': 3670.82, 'duration': 6.084}], 'summary': 'Cluster setup with total configured capacity of 36 gb and two live nodes.', 'duration': 67.334, 'max_score': 3609.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3609570.jpg'}], 'start': 3277.046, 'title': 'Hadoop name and data node setup', 'summary': 'Covers setting up hadoop name and data nodes, including starting instances, setting up ssh keys, verifying processes, and troubleshooting instances not starting on data nodes. it also details troubleshooting steps for hadoop data node setup, including removing metadata, starting processes individually, and accessing the cluster setup, with a total configured capacity of 36 gb and two live nodes.', 'chapters': [{'end': 3418.551, 'start': 3277.046, 'title': 'Setting up hadoop name and data nodes', 'summary': 'Covers setting up the hadoop name and data nodes, including starting instances, setting up ssh keys, verifying processes, and troubleshooting instances not starting on data nodes.', 'duration': 141.505, 'highlights': ['The chapter covers setting up the Hadoop name and data nodes, including starting instances, setting up SSH keys, verifying processes, and troubleshooting instances not starting on data nodes.', 'The value of dfs.name.dir in the HDFS site XML is discussed as the directory for the name node.', 'Setting up SSH keys is mentioned as a prerequisite for starting instances to avoid repetitive password prompts.', 'The process of verifying if the instances have started using JPS is explained, with a mention of successful start of master nodes but not data nodes.', 'Troubleshooting the issue of data nodes not starting is highlighted as a key aspect of the chapter.']}, {'end': 3733.19, 'start': 3418.591, 'title': 'Troubleshooting hadoop data node setup', 'summary': 'Details troubleshooting steps for hadoop data node setup, including removing metadata, starting processes individually, and accessing the cluster setup, with a total configured capacity of 36 gb and two live nodes.', 'duration': 314.599, 'highlights': ['The chapter demonstrates troubleshooting steps for Hadoop data node setup, including removing metadata and starting processes individually, to ensure the data node and node manager are up and running.', 'The transcript provides practical steps for accessing the cluster setup, revealing a total configured capacity of 36 GB and two live nodes, which is essential for building a multi-node cluster.', 'The speaker explains the significance of accessing specific ports for different Hadoop processes, such as 50070 for NameNode and 8088 for resource manager, to utilize the associated web services and user interfaces.', 'The process of running a Hadoop DFS admin report to obtain the same details as the NameNode user interface is highlighted as a valuable tool for monitoring the Hadoop setup and configurations.']}], 'duration': 456.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3277046.jpg', 'highlights': ['The chapter covers setting up the Hadoop name and data nodes, including starting instances, setting up SSH keys, verifying processes, and troubleshooting instances not starting on data nodes.', 'The process of verifying if the instances have started using JPS is explained, with a mention of successful start of master nodes but not data nodes.', 'The chapter demonstrates troubleshooting steps for Hadoop data node setup, including removing metadata and starting processes individually, to ensure the data node and node manager are up and running.', 'The transcript provides practical steps for accessing the cluster setup, revealing a total configured capacity of 36 GB and two live nodes, which is essential for building a multi-node cluster.', 'The process of running a Hadoop DFS admin report to obtain the same details as the NameNode user interface is highlighted as a valuable tool for monitoring the Hadoop setup and configurations.']}, {'end': 4636.308, 'segs': [{'end': 3762.426, 'src': 'embed', 'start': 3733.85, 'weight': 1, 'content': [{'end': 3735.611, 'text': 'DN1 here and DN2.', 'start': 3733.85, 'duration': 1.761}, {'end': 3743.212, 'text': 'So this is how you set up a multi-node cluster and I think as part of this I showed you how to troubleshoot also.', 'start': 3736.891, 'duration': 6.321}, {'end': 3745.693, 'text': 'I mean we got an opportunity to look at the log files.', 'start': 3743.232, 'duration': 2.461}, {'end': 3750.714, 'text': 'So the log files are always located into the location where you have your Hadoop installed.', 'start': 3746.133, 'duration': 4.581}, {'end': 3752.454, 'text': 'Well, you can change the log location.', 'start': 3750.894, 'duration': 1.56}, {'end': 3762.426, 'text': 'You remember in our hadoopnv.sh we talked about within that file is what you can change the log location and the process identifies everything.', 'start': 3753.26, 'duration': 9.166}], 'summary': 'Setting up a multi-node cluster with log file troubleshooting in hadoop.', 'duration': 28.576, 'max_score': 3733.85, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3733850.jpg'}, {'end': 3802.957, 'src': 'embed', 'start': 3777.947, 'weight': 0, 'content': [{'end': 3786.03, 'text': 'Over a period of four weeks, eight classes for 24 hours, we cover, go in-depth into it, understand how to set up, how to set up cluster.', 'start': 3777.947, 'duration': 8.083}, {'end': 3789.672, 'text': 'I think some of you are asking how to set up a passwordless key as such.', 'start': 3786.631, 'duration': 3.041}, {'end': 3792.273, 'text': 'Everything will be showed to you each and every step.', 'start': 3789.812, 'duration': 2.461}, {'end': 3797.715, 'text': "So we are short of time right now, so that's why I cannot go into each and every topic.", 'start': 3793.113, 'duration': 4.602}, {'end': 3802.957, 'text': 'but, as you, I mean as we go through these sessions, you can definitely, I mean,', 'start': 3797.715, 'duration': 5.242}], 'summary': 'Covered eight classes for 24 hours, focusing on setting up clusters and passwordless key setup.', 'duration': 25.01, 'max_score': 3777.947, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3777947.jpg'}, {'end': 4045.631, 'src': 'embed', 'start': 4015.342, 'weight': 4, 'content': [{'end': 4018.404, 'text': "I think it's an unwritten rule.", 'start': 4015.342, 'duration': 3.062}, {'end': 4021.106, 'text': 'you can say what all configuration files you need to make.', 'start': 4018.404, 'duration': 2.702}, {'end': 4028.451, 'text': 'So you need to make sure that whatever files I have set or whatever properties I have set or the requisite properties.', 'start': 4021.206, 'duration': 7.245}, {'end': 4034.935, 'text': 'Where can you configure standby node? Yes, standby node can be configured in your HDFS site XML.', 'start': 4030.973, 'duration': 3.962}, {'end': 4041.369, 'text': 'You need to add a property, dfs.nameServices, which will indicate where your standby node will be running.', 'start': 4036.147, 'duration': 5.222}, {'end': 4045.631, 'text': 'How can we access another cluster from another machine?', 'start': 4042.25, 'duration': 3.381}], 'summary': 'Configuration files and properties must be set, including dfs.nameservices for standby node in hdfs site xml.', 'duration': 30.289, 'max_score': 4015.342, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo4015342.jpg'}, {'end': 4229.507, 'src': 'embed', 'start': 4202.979, 'weight': 6, 'content': [{'end': 4209.821, 'text': "Do you mean Job Tracker and Task Tracker daemons doesn't need to be installed? Absolutely, Snehal.", 'start': 4202.979, 'duration': 6.842}, {'end': 4218.404, 'text': "If you have Hadoop 2 running, you have YARN running, then yeah, you don't need to install Job Tracker and Task Tracker because it's all taken care of.", 'start': 4210.461, 'duration': 7.943}, {'end': 4225.966, 'text': 'The MapReduce again will be whatever jobs you are submitting will be in turn run on your YARN framework.', 'start': 4218.544, 'duration': 7.422}, {'end': 4229.507, 'text': 'So it runs on top of YARN, it is inbuilt into YARN.', 'start': 4226.506, 'duration': 3.001}], 'summary': 'In hadoop 2, yarn replaces job tracker and task tracker. mapreduce jobs run on yarn framework.', 'duration': 26.528, 'max_score': 4202.979, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo4202979.jpg'}, {'end': 4293.53, 'src': 'embed', 'start': 4269.708, 'weight': 3, 'content': [{'end': 4277.533, 'text': "So it's always good practice to have the number of nodes in your cluster should always be greater than the replication factor.", 'start': 4269.708, 'duration': 7.825}, {'end': 4285.663, 'text': "If you go below that, you will have a problem there where The data, if you have, let's say, for example,", 'start': 4278.174, 'duration': 7.489}, {'end': 4288.446, 'text': 'you have three X replication and three slave nodes.', 'start': 4285.663, 'duration': 2.783}, {'end': 4293.53, 'text': 'And three X replication will make sure what three copies of the data is copied to three nodes.', 'start': 4289.146, 'duration': 4.384}], 'summary': 'Cluster nodes should be > replication factor to ensure data redundancy and reliability.', 'duration': 23.822, 'max_score': 4269.708, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo4269708.jpg'}, {'end': 4481.278, 'src': 'embed', 'start': 4443.019, 'weight': 7, 'content': [{'end': 4446.181, 'text': 'Any reference book for Admin? Yeah, there are a lot of books that are available.', 'start': 4443.019, 'duration': 3.162}, {'end': 4452.946, 'text': 'How many nodes?', 'start': 4451.485, 'duration': 1.461}, {'end': 4455.608, 'text': 'what configuration nodes?', 'start': 4452.946, 'duration': 2.662}, {'end': 4463.133, 'text': 'Yeah, as part of this, as part of the course, we do have one session dedicated to the cluster configuration.', 'start': 4456.748, 'duration': 6.385}, {'end': 4476.136, 'text': 'where we talk about how to decide on what all nodes to take and how much, how can you decide on your cluster capacity planning.', 'start': 4463.792, 'duration': 12.344}, {'end': 4481.278, 'text': 'The question is what is Zookeeper?', 'start': 4479.137, 'duration': 2.141}], 'summary': 'Various books are available for admin, with a dedicated session on cluster configuration and capacity planning. zookeeper is discussed.', 'duration': 38.259, 'max_score': 4443.019, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo4443019.jpg'}, {'end': 4525.957, 'src': 'embed', 'start': 4504.333, 'weight': 2, 'content': [{'end': 4513.375, 'text': 'because resources need to be properly distributed across and resources need to be shared across multiple machines.', 'start': 4504.333, 'duration': 9.042}, {'end': 4519.935, 'text': 'So to avoid running into these race conditions and avoid into multiple issues, you have a Zookeeper implemented there.', 'start': 4514.153, 'duration': 5.782}, {'end': 4525.957, 'text': "So that's kind of a gatekeeper which will handle all your client requests appropriately and route them properly.", 'start': 4520.055, 'duration': 5.902}], 'summary': 'Zookeeper ensures proper resource distribution and routing for client requests.', 'duration': 21.624, 'max_score': 4504.333, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo4504333.jpg'}], 'start': 3733.85, 'title': 'Hadoop cluster setup and troubleshooting', 'summary': "Covers setting up a multi-node cluster, troubleshooting, and upcoming course details with a 24-hour module, q&a session insights, and zookeeper's role, providing comprehensive information for hadoop cluster setup and management.", 'chapters': [{'end': 3820.914, 'start': 3733.85, 'title': 'Setting up a multi-node cluster and troubleshooting in hadoop', 'summary': 'Covers setting up a multi-node cluster, troubleshooting, and upcoming course details, including a 24-hour module with in-depth coverage of setting up a cluster and passwordless key, over four weeks with eight classes.', 'duration': 87.064, 'highlights': ['The upcoming course on 7th November will cover a total of 24 hours over a period of four weeks, with eight classes and in-depth coverage of setting up a cluster and passwordless key.', 'The chapter explains how to set up a multi-node cluster and troubleshoot, including looking at log files and changing the log location within the Hadoop installation.', 'The course will demonstrate each step of setting up a cluster, troubleshooting, and bringing up an instance, ensuring understanding of the process.', 'The chapter mentions the opportunity to look at log files and assures participants of detailed guidance through the sessions.']}, {'end': 4181.205, 'start': 3821.234, 'title': 'Hadoop cluster setup q&a', 'summary': 'Provides insights into hadoop cluster setup, covering topics such as formatting disks, cluster configuration, data node and name node definitions, block size configurations, and replication factors.', 'duration': 359.971, 'highlights': ['Formatting disks for cluster setup involves specifying the storage directory and metadata location, with only that disk or location being formatted. Formatting disks for cluster setup involves specifying the storage directory and metadata location, with only that disk or location being formatted.', 'Explaining the definitions of data node and name node, where the name node is the master node storing metadata and the data node is the slave node where data resides. The definitions of data node and name node are explained, where the name node is the master node storing metadata and the data node is the slave node where data resides.', 'Insights into block size configurations, with an example illustrating the impact of block size on the number of blocks for a given file. Insights into block size configurations, with an example illustrating the impact of block size on the number of blocks for a given file.', 'Default replication factor in Hadoop cluster setup is three, ensuring data redundancy and fault tolerance. Default replication factor in Hadoop cluster setup is three, ensuring data redundancy and fault tolerance.', 'Guidance on configuring standby node in HDFS site XML for fault tolerance and high availability. Guidance on configuring standby node in HDFS site XML for fault tolerance and high availability.']}, {'end': 4442.599, 'start': 4182.377, 'title': 'Hadoop q&a session highlights', 'summary': 'Covers hadoop-related queries, including the installation of hadoop 2, the impact of replication factor on slave nodes, and the management of compute and storage with default and overridden values.', 'duration': 260.222, 'highlights': ["Rajit's query about the impact of replication factor on slave nodes It's advisable to have the number of nodes in the cluster greater than the replication factor to avoid data replication issues, such as the failure to replicate across an insufficient number of nodes.", 'Explanation of the installation of Hadoop 2 and the role of Job Tracker and Task Tracker daemons In Hadoop 2, with YARN running, the installation of Job Tracker and Task Tracker daemons is unnecessary as they are taken care of by the framework, and MapReduce jobs run on top of YARN.', 'Clarification on the management of compute and storage with replication factor Default values for replication factor and block size are 3 and 128 MB, respectively. Explicitly calling out the properties can override these default values.']}, {'end': 4636.308, 'start': 4443.019, 'title': 'Cluster configuration and zookeeper in hadoop 2', 'summary': "Discusses cluster capacity planning, zookeeper's role as a gatekeeper in a distributed file system, and configuration of application factor on hdfs at xml dfs dot replication, providing essential insights for setting up a hadoop cluster.", 'duration': 193.289, 'highlights': ['The role of Zookeeper as a gatekeeper in a distributed file system is crucial to avoiding race conditions and ensuring proper resource distribution across multiple machines.', 'The session includes a dedicated portion for cluster configuration, covering how to decide on the nodes to take and capacity planning, which is essential for setting up a Hadoop cluster.', 'The configuration of application factor on HDFS at XML DFS dot replication is highlighted as a key step in the process of setting up a Hadoop cluster.']}], 'duration': 902.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/-YEcJquYsFo/pics/-YEcJquYsFo3733850.jpg', 'highlights': ['The upcoming course on 7th November will cover a total of 24 hours over a period of four weeks, with eight classes and in-depth coverage of setting up a cluster and passwordless key.', 'The chapter explains how to set up a multi-node cluster and troubleshoot, including looking at log files and changing the log location within the Hadoop installation.', 'The role of Zookeeper as a gatekeeper in a distributed file system is crucial to avoiding race conditions and ensuring proper resource distribution across multiple machines.', 'Default replication factor in Hadoop cluster setup is three, ensuring data redundancy and fault tolerance.', 'Guidance on configuring standby node in HDFS site XML for fault tolerance and high availability.', "Rajit's query about the impact of replication factor on slave nodes It's advisable to have the number of nodes in the cluster greater than the replication factor to avoid data replication issues, such as the failure to replicate across an insufficient number of nodes.", 'In Hadoop 2, with YARN running, the installation of Job Tracker and Task Tracker daemons is unnecessary as they are taken care of by the framework, and MapReduce jobs run on top of YARN.', 'The session includes a dedicated portion for cluster configuration, covering how to decide on the nodes to take and capacity planning, which is essential for setting up a Hadoop cluster.']}], 'highlights': ["The default block size in Hadoop 2.7.1 is 128 MB, providing precise technical information and insights into Hadoop's storage capabilities.", 'Amazon claimed a 30% revenue increase in the first month after implementing the recommendation engine.', 'The concept of daemons in Hadoop allows for the scalability of the cluster, enabling the addition of nodes as needed.', 'The typical Hadoop cluster architecture includes a master node, which contains the name node and the resource manager, and slave nodes, which contain the data nodes and node managers.', 'Importance of RAM in HDFS storage for metadata and file storage node identification.', 'Verifying communication between name and data nodes, such as pinging DN2 and expecting a response.', 'The chapter covers configuring HDFS, MapReduce, and YARN property files, including specific properties such as MapReduce.framework.name and running on top of yarn.', 'The upcoming course on 7th November will cover a total of 24 hours over a period of four weeks, with eight classes and in-depth coverage of setting up a cluster and passwordless key.']}